<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>IDDM-</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National Aviation University</institution>
          ,
          <addr-line>1 Lubomyra Huzara Ave, Kyiv, 03058</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>5</volume>
      <fpage>18</fpage>
      <lpage>20</lpage>
      <abstract>
        <p>Modeling and analysis of small data raises essential problems and challenges stemming from insufficient sampling of unknown observable distributions, complicating and exacerbating confident analysis and often reducing statistical confidence of the conclusions. In this work, an original approach to analysis of small data is proposed that is based on an ensemble of generative neural network models with the intent of identifying stable clusters of data in informative generative representations. We demonstrate how characteristic structure of stable clusters in generative representations of a dataset of images of basic geometric shapes can be determined from representations produced by a generative ensemble. The method can be used to identify characteristics structure, perform correlation analysis and augment data of different types and under some conditions that were discussed, improve the performance of supervised classification in cases with a deficit of training data.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Unsupervised learning</kwd>
        <kwd>ensemble learning</kwd>
        <kwd>clustering</kwd>
        <kwd>statistical analysis</kwd>
        <kwd>small data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modeling and analysis of small data raises essential problems and challenges stemming from
insufficient sampling of unknown observable distributions, complicating and exacerbating confident
analysis and often producing lower statistical confidence of the conclusions. Nevertheless, early
analysis of structure and trends in emerging data can be essential in situations and events of novel or
rare nature / condition where large volumes of confident decisions may not be available for any
reason [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Among the admitted challenges in practical applications of methods of machine intelligence in the
analysis of small data are those of stability of learning and produced results. It can be observed for
example, as a strong dependency of the learning success on the choice of training parameters,
selection, temporal ordering of batches and other training factors. Issues that have been noted [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ]
include reproducibility of the results, overfitting, inability to generalize and others. Issuing from these
challenges, results produced by models of similar architecture with the same datasets can be
inconsistent and volatile, and the ability to generalize characteristic patterns, more limited than in
conventional applications. Not in the least, reproducibility of the results that is essential in
establishing confidence in the methods can be less certain, significantly complicating comparison of
methods and models.
      </p>
      <p>
        Numerous efforts attempted to examine the problem of stability of small data learning and a
number of promising approaches and directions described, including: cross-validation; ensemble
methods [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; Radial-Basis Function (RBF) networks [
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ] and other methods [
        <xref ref-type="bibr" rid="ref7 ref8">7,8</xref>
        ]. However, whereas
some of the methods shown success in a number of specific applications, generality and applicability
to different types of analyzed data and problems, to the best of our knowledge could not be
established due to specialized structure, architecture and essential assumptions about the distributions.
      </p>
      <p>
        In parallel to these developments, methods of unsupervised generative learning [
        <xref ref-type="bibr" rid="ref10 ref9">9,10</xref>
        ]
demonstrated effective ability to achieve significant simplification of complex data in the process of
unsupervised generative learning via reduction of redundancy in the observable parameters and
identification (extraction) of informative features. In a growing number of instances, these methods
were instrumental in the analysis of patterns in complex real-world data [
        <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
        ] including data
strongly constrained by the size of the sample [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Critically for success in the identified problem
area, application of such methods is not limited by availability of prior data including labeled datasets,
and in many cases can be successful with smaller samples than conventional methods of supervised
learning. These traits set methods of unsupervised generative learning as good candidates for analysis
of data constrained by both size and availability of confident prior knowledge, without precluding
aggregation of confidently known data for subsequent analysis with conventional methods.
      </p>
      <p>
        To address these challenges as outlined above, we propose the ensemble approach [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] based on a
collective of unsupervised generative neural networks to identify stable patterns and structure in the
observable (training) data, that simultaneously addresses both problems of the deficit of known labels,
and stability of learning. Stable structures in the informative low-dimensional representations of
observable data produced by generative models can be identified via a process that was developed and
used for several purposes, including correlation analysis by factors of interest, augmentation of small
data by producing newly generated data points from identified characteristic latent structure. In
contrast to some of the methods mentioned earlier, this approach does not have strong dependencies
on specific assumptions about observable distributions and can be used in a generic manner with data
of different types and origin.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>An ensemble of generative neural network models with an architecture of a deep convolutional
autoencoder with a strongly compressed representation layer was used to produce two or
threedimensional representations of small datasets of images of basic geometrical shapes.</p>
      <p>
        The advantage of the proposed architecture stems from previous applications in producing
informative low-dimensional representations of complex data as well as universal approximation
capacity of neural network models [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] making them a useful and versatile tool in modeling diverse
distributions of complex data.
      </p>
      <p>Following successful generative training of the models in the ensemble, embedded distributions of
the evaluated datasets were produced in the spaces of latent coordinates. We then attempted to
identify stable populations of recognizable latent structures, such as visually identifiable clusters. The
resulting structure of identified stable clusters had to be invariant with respect to individual training
model and therefore represent innate characteristics of the input data that was possible to verify due to
known composition of the dataset.
2.1.</p>
    </sec>
    <sec id="sec-3">
      <title>Generative Neural Network Architecture</title>
      <p>
        A deep convolutional autoencoder neural [
        <xref ref-type="bibr" rid="ref16 ref17">16,17</xref>
        ] had the input layer of dimension p with 2-3
convolutional layers common in the practice of learning visual data. Models had convolutional layers
for acquisition of visual features, one deep layer and a central encoding layer of size d, creating
twoor three-dimensional (i.e. d = 2, 3) latent representations of the input data defined by activation values
of the latent neurons in the encoding layer.
      </p>
      <p>
        Overall, generative autoencoder models in this work had 48,000 – 96,000 trainable parameters
depending on configuration of layers. The decoding or generating stage was fully symmetrical to the
encoder. The models were implemented in Tensorflow / Keras [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] with data processing, plotting and
visualization Python packages used in the analysis of the results.
      </p>
      <p>An architecture diagram of the model is shown in Figure 1.
2.3.</p>
    </sec>
    <sec id="sec-4">
      <title>Unsupervised Ensemble Learning</title>
      <p>We decided to approach the question of stability of learning with small data with a set (an
ensemble) of generative neural network models that do not require prior knowledge, including in the
form of labeled data, for successful training. An ensemble of trained generative models of a size n
thus produced an array of two-dimensional representations of the input data as shown in Figure 3.</p>
      <p>As a result of this phase of unsupervised generative ensemble learning was produced a set of pairs:
R = { (trained model, map(input data point, 2D latent position) × n }. Association of a unique id to a
latent position of an input data point (Figure 2, e1) relative to that of the other points in the set
allowed to identify stable clusters K in the input data by entirely unsupervised process as follows
(pseudocode, where D is input dataset, E: encoding phase of the generative model, where e(x) = E(x)
encoded (latent) image of observable data point, m: the number of identified clusters):
for x(k) in D:
if ek = E(x(k)), ek-l in Kl : K(x) = Kl (known cluster)
else if conf(ek, L) &gt; γ : K(x) = L; m: = m+1 (a new cluster)
else: K(x) = A (not in cluster)
where A: an arbitrary id for elements with uncertain cluster; γ: confidence of identification of the
latent position ek belonging to cluster L. The process can be described as follows: if the next element
is in the same cluster as an earlier one (i.e., with a lower sequential id in the dataset), the cluster of
that element is assigned. If cluster of the element is uncertain, an arbitrary constant number is
assigned; finally, if neither of the conditions is satisfied, a new cluster is assigned and the process is
repeated until the dataset is exhausted. This process is deterministic, and as can be seen does not
depend on selection of ordering sequence, as long as the same sequence is maintained throughout the
process.</p>
      <p>There result is a matrix K(D) of (data point, cluster id) pairs of a dimension (M × n) where M is
the size of the dataset and n, the of the ensemble of generative models, where points in the same
cluster (including uncertain cluster association) have the same cluster id.</p>
      <p>In the final step one can obtained the set of stable clusters Ks(D) identified by the ensemble as a
subset of K(D) satisfying certain confidence criteria cs:
(1)</p>
      <p>For example, if correlation of K(x) in the matrix K(D) was found to be 0.9 (i.e., 9 out of 10 models
in the ensemble produced the same cluster id for a given element) and the size of the ensemble, 20
then the 95% confidence interval for the correlation coefficient of the element and the cluster would
be [0.76, 0.96] indicating a strong and confident association of an element to a cluster.</p>
      <p>The resulting subset of stable clusters Ks(D) identified in the described process can be used in an
analysis of composition of the input data and a number of other applications as discussed in the
subsequent sections. It may be worth reiterating that at no point in the analysis any true known
samples of classes in the input dataset were used.</p>
    </sec>
    <sec id="sec-5">
      <title>3. Results</title>
    </sec>
    <sec id="sec-6">
      <title>3.1. Cluster Structure</title>
      <p>
        The results of evaluation of the cluster structure with the datasets of images in a process described
in the preceding section are presented in Table 2 and Table 3. Identification of clusters was performed
by visual analysis (examples in Figure 3) that demonstrate both stability and accuracy of the
identified cluster structure with the data in the study. In the future, unsupervised clustering methods
such as DbScan [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], Meanshift and others can be applied to identify stable cluster structure in an
automated unsupervised process.
      </p>
      <p>Table 2
Cluster composition, G-150 and G-300 datasets, n(1) = 20</p>
      <sec id="sec-6-1">
        <title>Dataset G-150 G-300</title>
      </sec>
      <sec id="sec-6-2">
        <title>Number, clusters 3 3</title>
      </sec>
      <sec id="sec-6-3">
        <title>Clustered fraction(2)</title>
        <p>at conf = 0.8
1.0
1.0</p>
      </sec>
      <sec id="sec-6-4">
        <title>Clustered fraction at conf = 0.95 0.92 0.96</title>
      </sec>
      <sec id="sec-6-5">
        <title>Visible cluster</title>
        <p>separation</p>
        <p>high
high, very high
(1) Ensemble size
(2) Fraction of the dataset in identified stable clusters, at confidence level
Table 3</p>
      </sec>
      <sec id="sec-6-6">
        <title>Cluster confusion matrix, G-150 dataset</title>
      </sec>
      <sec id="sec-6-7">
        <title>Cluster, type Cluster 1 Cluster 2 Cluster 3 0 (circle) 1.0 0. 0. 1 (triangle) 0.25 0.75 0. 2 (background) 0. 0.15 0.85</title>
        <p>With a large unsupervised dataset, from G-150 to G-300 significantly improved accuracy of the
cluster to type association was observed, rising to the level of 95 – 100%. Stability of the latent
structure of clusters is a key observation and a necessary requirement for successful generation of new
data, confirming that clusters identified by the unsupervised generative ensemble method indeed
described stable characteristic patterns in the observable data.
3.2.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Generation and Prototypes</title>
      <p>The architecture of generative models provides a direct way to propagate positions in the latent
space of generative models to the space of input (observable) parameters. The mapping can be
obtained by taking a latent position with coordinates p = (l1, l2) as input to the generator component of
the model (Figure 1) to produce an observable position Xobs:
(2)
where G: R  O, is the generator component of the model, operating from the latent space R into the
space of inputs, O.</p>
      <p>Based on (2) generative ability of successfully trained generative models can be used to create new
data points with “similarity” to identified characteristic patterns by selecting positions in the latent
regions of identified stable clusters as illustrated in Figure 4.</p>
      <p>The effectiveness of the proposed method of ensemble cluster-based data augmentation can be
supported by the arguments:</p>
      <p>Consider a small data set S of size N with p input parameters. With a conventional method of
approximation, e.g. Gaussian, the error of the mean in each of the parameters of the data can be
estimated as: pmean / √N [21]. Where N is small the dispersion of observable parameters can be
sufficiently large. Next, if a successful generative representation of a lower dimensionality d with a
good cluster structure existed, the data can be approximated with a good accuracy by a
quasi-multimodal distribution with the number of modes Nc, where Nc: the number of stable latent clusters, and
the dispersion, dmean / √nclus (the size of a latent cluster). Where d is small: (i.e., a strong
reduction of dimension of observable data) and the number of samples in the principal clusters,
sufficiently large one obtains a statistical problem of significantly lower complexity and dispersion.
As an example, in this study, the reduction of dimensionality from 4,096 input parameters (grayscale
images with resolution 64 × 64) to 2, i.e. by a factor of ~2,000 was achieved.</p>
    </sec>
    <sec id="sec-8">
      <title>4. Applications</title>
    </sec>
    <sec id="sec-9">
      <title>4.1. Data and Factor Analysis</title>
      <p>Decomposition of unsupervised datasets into a structure of stable latent clusters, where successful,
can be helpful in the analysis of the distribution of data and associated factors of interest. As
discussed earlier, the proposed approach offers a general, independent of specific types of data
capability to decompose observable data into a structure of more homogenous regions, or clusters.
Even without known samples of classes of interest, distributions of clusters, both latent and
observable can be analyzed in detail, including observation of characteristic representatives of
clusters, prototypes.</p>
      <p>Generative ability of models in the study combined with cluster decomposition of informative
representations allows non-trivial analysis of composition of input data by generating typical
observable instances of stable clusters, or prototypes of characteristic natural classes of data. With
stable clusters identified with proposed methods, one can generate specific latent positions associated
with clusters for example, as a mean of cluster member positions and propagate them to the
observable space with generative transformation (2).
(3)
where K: stable latent cluster, P(K): observable prototype. Examples of observable prototypes of
clusters in the dataset G-150 are shown in Figure 5.</p>
    </sec>
    <sec id="sec-10">
      <title>Data Generation and Augmentation</title>
      <p>Based on the discussion in Section 3.2 cluster decomposition can be used to generate new data
points and augment small datasets, again without any limitations on the prior knowledge of the
distribution of the input data. Once a structure of stable latent clusters has been identified, it is
straightforward to determine their distribution regions and produce latent candidate positions for
augmentation of the original data. Generative transformation (2) can then be applied to obtain the
related data points in the space of observable parameters.</p>
      <p>To summarize the results and discussion on data generation, ensemble-based generative
augmentation of small data can be successful under these conditions:
 The latent dimensionality is sufficiently small: .
 The models demonstrate good learning success and consistent, stable, cluster structure.
 The number of stable clusters is small compared to the size the dataset (the number of
samples), and the population of main clusters is sufficiently large: N / nclus 1.</p>
      <p>If the conditions are satisfied, the original data can be described by a multi-modal distribution of
stable latent clusters that can be identified with density clustering or another method as demonstrated
earlier; further, augmentation of data can be performed in an unsupervised process based on the
identified distribution in the latent space and will provide stable results invariant with respect to the
selection of a specific instance of generative model.
4.3.</p>
    </sec>
    <sec id="sec-11">
      <title>Classification</title>
      <p>The method of augmentation based on unsupervised cluster structure produced in generative
learning can be employed to improve the success of classification with models of supervised learning
trained with small datasets, as in the conventional practice of supervised learning, the size and
representative quality of training data can have strong influence on the accuracy of classification [22].</p>
      <p>The success of the method essentially depends on the presence of a correlation between stable
latent structure of clusters and the factor of interest for classification that can be used as a label in
supervised learning. If such a correlation can be established between the data points in identified
stable clusters and the factor of interest as discussed in Section 4.1, an augmentation process outlined
in Sections 3.2, 4.2 can be applied, with class labels assigned to generated data points based on the
established association between latent clusters and known classes. Such a process of augmentation
can produce an improvement in classification accuracy due to larger and more representative dataset
in supervised learning.</p>
      <p>
        For example, a clustering analysis performed with the ECE dataset [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] demonstrated a strong
correlation of the identified cluster structure with the classification factor of interest, an
epidemiological outcome. In that and similar cases where correlation of identified cluster structure
with an observable factor of interest can be established, augmentation of data with the method
described in this work can produce substantial improvements in classification.
      </p>
    </sec>
    <sec id="sec-12">
      <title>5. Conclusions</title>
      <p>The method of identification of unsupervised cluster structure in generative representations of
small datasets with an ensemble of unsupervised generative models has been described and verified
with small datasets of visual data of basic geometrical shapes. It was shown that a stable structure of
clusters can be identified in the latent representations of successful generative models with high
confidence in the interval 90–100% with the dataset used in the study. The structure of stable clusters
representing characteristic types in the input data can be used to augment small datasets by generating
new data points with several potential applications, including enhancement of labeled datasets with an
objective to improve the success of classification in supervised learning.</p>
      <p>It needs to be noted that the method described in this work may not have universal applicability to
all datasets and its effectiveness is defined not only by the observable parameters, but also by the
composition and characteristics of the dataset such as: the size, the number and population of
principal clusters in the latent representation of the data and their correlation with the factors of
interest. Where the conditions described in Section 3.2 are met, augmentation can produce additional
data points associated with principal clusters and improve the performance of conventional
classification methods trained with augmented datasets.</p>
      <p>In novel, rare, non-standard cases, events and environments, large volumes of known data may be
needed for confident analysis with conventional methods of machine learning. Availability of training
data may present strong challenges in such cases. Methods of unsupervised generative learning,
including the ensemble approach presented in this work, can be successful in identification of
characteristic structure and patterns even with smaller sets of observable data without requirements of
massive prior knowledge offering a practical direction toward improving the confidence of the
analysis.</p>
    </sec>
    <sec id="sec-13">
      <title>6. References</title>
      <p>[21] Wendland H.: Scattered data approximation. Cambridge University Press 9 (2005).
[22] Richards J.A.: Supervised classification techniques. In: Remote Sensing Digital Image Analysis.</p>
      <p>Springer, Berlin, Heidelberg 247–318 (2013).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Hekler</surname>
            <given-names>E.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klasnja</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chevance</surname>
            <given-names>G.</given-names>
          </string-name>
          et al.:
          <article-title>Why we need a small data paradigm</article-title>
          ,
          <source>BMC Medicine</source>
          ,
          <volume>17</volume>
          (
          <issue>1</issue>
          )
          <fpage>133</fpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Wasserman P.D.</surname>
          </string-name>
          :
          <article-title>Neural computing: theory and practice</article-title>
          . Van Nostrand-Reinhold, New York (
          <year>1989</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>LeBaron</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weigend</surname>
            <given-names>A.S.:</given-names>
          </string-name>
          <article-title>A bootstrap evaluation of the effect of data splitting on financial time series</article-title>
          .
          <source>IEEE Trans. Neural Networks</source>
          <volume>9</volume>
          <fpage>213</fpage>
          -
          <lpage>220</lpage>
          (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Cunningham</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>J. Carney</surname>
            ,
            <given-names>S. Jacob S.</given-names>
          </string-name>
          :
          <article-title>Stability problems with artificial neural networks and the ensemble solution</article-title>
          .
          <source>Artificial Intelligence in Medicine</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          )
          <fpage>217</fpage>
          -
          <lpage>255</lpage>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Karar</surname>
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robust</surname>
            <given-names>RBF</given-names>
          </string-name>
          <article-title>neural network-based backstepping controller for implantable cardiac pacemakers</article-title>
          ,
          <source>Int. J. Adap. Cont. Sign. Proc 32</source>
          <fpage>1040</fpage>
          -
          <lpage>1051</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Izonin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tkachenko</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dronuyk</surname>
            <given-names>I.</given-names>
          </string-name>
          et al.:
          <article-title>Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method</article-title>
          .
          <source>Math Biosc. Eng</source>
          ,
          <volume>18</volume>
          (
          <issue>3</issue>
          )
          <fpage>2599</fpage>
          -
          <lpage>2613</lpage>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Forman</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Learning from little: comparison of classifiers given little training</article-title>
          .
          <source>In: Proceedings of PKDD</source>
          ,
          <volume>19</volume>
          <fpage>161</fpage>
          -
          <lpage>172</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Geris</surname>
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Computational modeling in tissue engineering</article-title>
          . Springer-Verlag, Berlin (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning deep architectures for AI</article-title>
          .
          <source>Foundations and Trends in Machine Learning</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>127</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Coates</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>An analysis of single-layer networks in unsupervised feature learning</article-title>
          .
          <source>In: Proceedings of 14th International Conference on Artificial Intelligence and Statistics</source>
          <volume>15</volume>
          ,
          <fpage>215</fpage>
          -
          <lpage>223</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Gondara</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Medical image denoising using convolutional denoising autoencoders</article-title>
          .
          <source>In: 16th IEEE International Conference on Data Mining Workshops (ICDMW)</source>
          , Barcelona, Spain,
          <fpage>241</fpage>
          -
          <lpage>246</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Concept learning through deep reinforcement learning with memory augmented neural networks</article-title>
          .
          <source>Neural Networks</source>
          <volume>110</volume>
          ,
          <fpage>47</fpage>
          -
          <lpage>54</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Dolgikh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Unsupervised clustering in epidemiological factor analysis</article-title>
          .
          <source>The Open Bioinformatics Journal</source>
          <volume>14</volume>
          (
          <issue>1</issue>
          ),
          <fpage>63</fpage>
          -
          <lpage>72</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Opitz</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maclin</surname>
          </string-name>
          , R.:
          <article-title>Popular ensemble methods: An empirical study</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>11</volume>
          <fpage>169</fpage>
          -
          <lpage>198</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Hornik</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stinchcombe</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>White</surname>
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Multilayer feedforward neural networks are universal approximators</article-title>
          .
          <source>Neural Networks</source>
          <volume>2</volume>
          (
          <issue>5</issue>
          ),
          <fpage>359</fpage>
          -
          <lpage>366</lpage>
          (
          <year>1989</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>A tutorial on deep learning: autoencoders, convolutional neural networks and recurrent neural networks</article-title>
          . Stanford University,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Dolgikh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Low-dimensional representations in generative self-learning models</article-title>
          .
          <source>In: Proc. 20th International Conference Information Technologies - Applications and Theory (ITAT-2020)</source>
          , Slovakia,
          <source>CEUR-WS.org 2718</source>
          ,
          <fpage>239</fpage>
          -
          <lpage>245</lpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Keras</surname>
          </string-name>
          :
          <article-title>Python deep learning library</article-title>
          . https://keras.io/, last accessed:
          <year>2021</year>
          /08/21.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Dolgikh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Topology of conceptual representations in unsupervised generative models</article-title>
          .
          <source>In: Proc. 26th International Conference on Information Society</source>
          and University Studies (IVUS
          <year>2021</year>
          )
          <article-title>Kaunas</article-title>
          , Lithuania,
          <source>CEUR-WS.org 2915</source>
          ,
          <fpage>150</fpage>
          -
          <lpage>157</lpage>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H-P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>A density-based algorithm for discovering clusters in large spatial databases with noise</article-title>
          .
          <source>Proc. Second International Conference on Knowledge Discovery and Data Mining (KDD-96</source>
          )
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>