=Paper=
{{Paper
|id=Vol-2718/paper10
|storemode=property
|title=Low-Dimensional Representations in Generative Self-Learning Models
|pdfUrl=https://ceur-ws.org/Vol-2718/paper10.pdf
|volume=Vol-2718
|authors=Serge Dolgikh
|dblpUrl=https://dblp.org/rec/conf/itat/Dolgikh20
}}
==Low-Dimensional Representations in Generative Self-Learning Models==
<pdf width="1500px">https://ceur-ws.org/Vol-2718/paper10.pdf</pdf>
<pre>
       Low-Dimensional Representations in Generative Self-Learning Models

                                                                 Serge Dolgikh

                                             National Aviation University, Kyiv 02000 Ukraine,
                                                         sdolgikh@nau.edu.ua

Abstract: Informative representations play an important                   YouTube images.
role in learning and intelligence. We analyzed distribu-                  In [8] a spontaneous formation of grid-like cells, similar
tions of image classes in low dimensional representations                 to those observed in mammals was detected in a recurrent
created by a class of deep autoencoder neural network                     neural network with deep reinforcement learning. Higher-
models in unsupervised learning. The representations of                   level concept-related structures were observed in the rep-
real aerial images have been shown to contain higher-level                resentations of deep autoencoder models with strong re-
concept structures such as low-dimensional surfaces and                   dundancy reduction with data representing raw Internet
higher density clusters that form as a result of unsuper-                 traffic in large public telecommunications networks in [9].
vised training with minimization of generative error. Com-                The results demonstrated that a density structure in the
pact and well-defined character of some distributions was                 representations created by such models that emerges as a
demonstrated with a positive correlation between the cate-                result of unsupervised training with minimization of gen-
gorization performance of the model and its classification                erative error it can be used in the iterative approach to
accuracy. The results provide direct empirical support for                training of artificial learning systems that can offer higher
the connection between unsupervised learning in models                    flexibility and considerably lower ground truth require-
with self-encoding and regeneration and categorization of                 ments compared to common methods. Representations
native concepts in the representations.                                   of deep variational autoencoder models were studied in
                                                                          [10], demonstrating effective disentangled representations
                                                                          with data of several different types in entirely unsuper-
1    Introduction: Unsupervised                                           vised learning under the constraints of redundancy reduc-
     Representations                                                      tion.
                                                                          These and a number of further results [11, 12] may suggest
Study of unsupervised representations with the intent to                  that certain neural networks whether artificial or biologi-
identify and separate the most informative components in                  cal, in the process of unsupervised learning with an incen-
general data has a long history in machine learning. Un-                  tive to improve the quality of regeneration of the observ-
supervised hierarchical representations created with mod-                 able data may naturally structure information by character-
els like Restricted Botzmann Machines (RBM), Deep Be-                     istics of similarity in their representations, thus identifying
lief Networks (DBN) [1, 2] different types of autoencoder                 certain natural or native concepts that perhaps can be cor-
models [3] proved to be efficient and improved the accu-                  related with higher-level concepts in the observable data.
racy of subsequent classification [4]. The deep relation-                 Based on this observation, the hypothesis investigated in
ship between training of intelligent models and the statis-               this work is that the natural structure in the representations
tical principles such as minimization of free energy was                  created by certain unsupervised models in self-supervised
studied in [5, 6] and other works leading to forming un-                  learning with minimization of the generative error can be
derstanding that common methods of training such as gra-                  correlated with higher-level concepts in the input data, and
dient descent in deep neural networks and Contrastive Di-                 that relationship can be used in developing approaches to
vergence in DBN generally produce configurations com-                     flexible and iterative learning in the environments where
patible with the principles of minimization of free energy                prior domain knowledge is scarce or not available.
and variational Bayesian inference.                                       In this study we are following the line of research out-
On the experimental side, interesting effects of sponta-                  lined in [7, 9] by first creating a compact representation
neous high-level concept sensitivity in unsupervised deep                 of the observable dataset with a deep self-encoding neu-
neural network models were observed in a number of                        ral network model (a two-stage stacked autoencoder), then
works. Google Lab team [7] observed an intriguing ef-                     analysing the parameters of distributions of the higher-
fect of spontaneous formation of concept sensitive neu-                   level concepts in the dataset in the representation created
rons activated by images in certain higher-level category                 in unsupervised training. But unlike [7] that investigated
with a massive deep and sparse autoencoder neural net-                    single-neuron that is, essentially, one-dimensional repre-
work model trained in entirely unsupervised process with-                 sentations and distributions of concepts (Fig.1), the de-
out any exposure to groupd truth with very large arrays of                sign of the models in this study, with physical constraints
      Copyright c 2020 for this paper by its authors. Use permitted un-   on the dimensionality of the representation layer created
der Creative Commons License Attribution 4.0 International (CC BY         low-dimensional representations, that allowed to improve
4.0).
                                                                 numerical representation of dimension 576 from color im-
                                                                 ages with dimensions (32,32) to (128,128). The aim of this
                                                                 stage was to acquire higher scale features in the images via
                                                                 a sequence of convolution-pooling stages.
                                                                 The resulting numerical representation was used as the in-
                                                                 put to the second stage autoencoder with a strong reduc-
                                                                 tion of physical dimensionality of the representation layer.
                                                                 The dimension of the representation was chosen based on
                                                                 principal component analysis of the numerical representa-
                                                                 tion of the first stage that revealed three components with
                                                                 combined variation of over 0.95. Hence, the maximum

Figure 1: Effective activation of a concept-sensitive neu-
ron (based on [7])


the resolution of the learned concepts from “better than
random” across arbitrarily selected pre-known range of
higher-level concepts in [7] to “better than random binary”
and in a number of cases, “confident binary” classification
per concept that was not known previously. It is thought
that these results can be of an interest for the research com-
munity in unsupervised learning and self-learning systems        Figure 2: Stacked autoencoder model with physical redun-
because, as some recent studies indicate [13, 14] similar        dancy reduction
low-dimensional representations with only a small num-
ber of active neurons can play important role in sensory         compression of information achieved in the representation
networks of biologic systems, such as visual and smell           layer of the model was approximately 16,000, from input
processing; as well, the connection between unsupervised         images in the first stage to the final representation. A cer-
learning and concept structures in the representations may       tain advantage of the studied models is that they allow to
suggest approaches to self-learning that would be common         measure and visualize the distributions created in unsuper-
for biologic and artificial systems.                             vised training directly from the central layer of the latent
                                                                 representation. In feed-forward neural networks, the accu-
                                                                 racy of regeneration of input data combined with signifi-
2     Methods
                                                                 cant compression in the layer of representation means that
                                                                 the latent representation has retained significant essential
The model used in the study is a stacked two-stage au-           information about the original distribution and observing
toencoder with strong physical compression in the layer          it directly may yield some valuable insights about the char-
of the final representation. This choice was based on the        acter of the concept distributions in the observable data.
earlier cited results as well as some strong arguments in        The models were implemented with Keras/Tensorflow
favor of neural network models based on generative self-         [17]. For measurement and visualization of distributions
learning being good candidates for producing effective un-       we used common libraries and packages such as: sklearn-
supervised representations. Being a universal approxima-         kit, numpy, matplotlib and others.
tor [15], feedforward neural networks have virtually un-         Models were trained in an unsupervised autoencoder mode
limited versatility and are well suited to model complex         to achieve good reproduction of inputs measured by a cost
data types. And not in the least, deep neural networks are       function such as Mean Squared Error (MSE). Several cri-
widely present in biologic systems that are also highly suc-     teria of effectiveness of unsupervised training were used,
cessful in self-learning with minimal data [16].                 such as monitoring the cost function and cross-categorical
The data was represented by a dataset of raw images ob-          accuracy that both shown significant improvement in un-
tained in aerial observation of terrain, as described in this    supervised training with minimization of the generative er-
section.                                                         ror. Additionally, generative performance of trained mod-
                                                                 els was measured by calculating a mean deviation of the
2.1   Deep Stacked Autoencoder Model                             input sample from the generated output to the mean norm
                                                                 of the input sample, with an average result in the range of
The diagram of the model is given in Figure 2. The               0.1.
model produced two stages of representations of unpro-           In our view these training results and the fact that in feed-
cessed aerial image data. The encoder of the first stage         forward neural network models the output is generated
was a convolutional-pooling autoencoder that produced a          only from the information that is contained in the repre-
sentation layer indicate that the latent representation has    example, the associated density cluster for a sample X in
indeed retained significant essential information about the    the input data space can be calculated as:
original distribution.
                                                                         Knat (X) = cluster_model.predict(X)             (3)

2.2    Data                                                    where cluster_model is a density-based clustering method
                                                               trained with a general data sample in the latent representa-
The dataset consisted of approximately 1,100 color images      tion.
with resolution (64,64) manually labeled with ten higher-      To perform classification, a binary concept classifier can
level classes of terrain type such as “trees”, “buildings”,    be trained with a subset of labeled concept samples in the
“water”, etc., as described in Table 1. The higher-level       latent space. The resulting classifier can be applied to pre-
classes used in the study represented three different broad    dict the explicit concept class of samples in the input space
categories:                                                    as:
1. Background: the area of the class concept spans the en-              Kexp (X) = classi f ier.predict(encode(X))       (4)
tire image or most of it; an example is “trees” or “field”
2. Structure: the concept area spans a significant part of     where Kexp is the explicit or external class of the sample
the image area, such as roads; construction structures, e.g.   X predicted by the trained classifier. Thus Kexp and Knat
bridges, power lines; excavations.                             represent respectively, the externally known class of the
3. Object: an object located in compact area relative to       sample and its native or implicit cluster identified from the
the size of the image; vehicles and miscellaneous machin-      density distribution in the latent space needing no external
ery were in this category. The composition of the dataset      knowledge of the domain, distribution, or any other prior
                                                               knowledge about the data.
                                                               The structure in the latent representation that emerges as
               Table 1: Aerial image dataset                   a result of unsupervised training, or “unsupervised land-
                                                               scape” can be measured and observed by the following
      Class             Category, Number of samples            methods:
                                                               1. By applying unsupervised clustering in the represen-
        Buildings (1) background/structure, 100
                                                               tation to identify density distribution in general unlabeled
            Trees (2)         background, 100
                                                               data sample as well as concept samples
            Field (3)         background, 100
                                                               2. By measuring the parameters of general and concept
           Water (4) background/structure, 100
                                                               distributions in the representation space
           Roads (5)             structure, 100
                                                               3. By applying multi-dimensional histogram methods in
      Excavations (6)            structure, 100
                                                               the representation space to measure density and volume
        Vehilcles (7)               object, 100
                                                               distributions in general and concept samples 4. Via vi-
        Other (8-10)                varied, 400
                                                               sualization and direct observation of general and concept
                                                               samples in the representation space.

with classes of different categories allowed to investigate
the character of concept distributions in the latent repre-    2.4   Unsupervised Categorizaion
sentations for different types of higher-level concepts.       By unsupervised categorization is meant the ability of
                                                               some models with unsupervised self-encoding and regen-
2.3    Unsupervised Representations                            eration of the input data to group data samples in the la-
                                                               tent representation in compact structures by certain simi-
A trained model can perform the encoding transformation        larity. Such natively similar samples in the representation
from the observable data space to the latent representation    are then transformed in the generative stage of the model
obtained with the activations of the central layer of the      to samples in the observable data space that are related by
Phase 2 encoder, and the generative transformation from        association to the same, or related native concepts.
the latent representation to the observable space as:          To measure categorization ability of models, two types of
                                                               data samples were used:
          R(X) = encoder_model.encode(X)                (1)    1) concept samples transformed to the representation
           0                                                   space define concept distribution region, that is, the region
         X (Y ) = generator_model.decode(Y )            (2)
                                                               in the representation space where samples associated with
In the latent representation of a trained model, the emer-     certain higher-level concept can be found.
gent density structure can be identified by applying a         2) a general sample, a set of non-labeled data points that
density-based clustering method such as DBScan, Mean-          is used to identify and measure the size and shape of the
Shift and numerous variations [18]. It allows to identify      region in the representation space that is populated by all
density clusters of the encoded samples in the representa-     categories, in other words, the image of a representative
tion space without any need for the ground truth data. For     subset of the input data set in the latent space of the model.
Relative measurements of concept versus general distribu-
tions allow to draw conclusions about categorization per-
formance of the model for the given concept, such as the:
relative size, density of concept distribution regions, their
shape, dimensionality and other parameters that can affect
learning of the concept.
Distributions of data in the latent representations, or den-
sity landscape created by such models in the process of
unsupervised learning can then be analyzed, measured
and visualized by transforming marking subsets of labeled
concept samples to the latent space with encoding trans-
formation (1), while generative ability of the model can be
evaluated by measuring the deviation of the generated out-
put from the input.
The hypothesis that can be drawn from the results dis-
cussed earlier is that a structured information “landscape”
that emerges in unsupervised training of the models with
the incentive to reduce the regeneration error can be corre-
lated with higher-level concepts that have strong represen-
tation in the input data.

                                                                Figure 3: Compact concept distributions in the latent space
3     Results

3.1   Visualization Analysis                                    data in a very compact and efficient way.
                                                                It is worth noting that these results also substantiate the
Concept distributions in the latent space created in unsu-      manifold assumption commonly used in unsupervised and
pervised training with minimization of regeneration error       semi-supervised learning [19]. For most of studied con-
can be visualized and measured directly. To produce visu-       cepts in this category distribution regions indeed consisted
alizations of concept regions, subsets of concept samples       of connected and smooth manifolds or sets of such mani-
were transformed to the latent state of a pre-trained model     folds. The results of measurements of the distribution pa-
and visualized with available plotting packages. Contin-        rameters for these concepts will be presented in the next
uous approximations of the concept regions in the latent        section.
space were obtained with triangular interpolation of the
concept samples transformed to the latent space.
                                                                Sparse Distributions Distributions of object and struc-
                                                                ture type concepts showed a different pattern that was
Compact Distributions It was observed that the classes          noticeably sparser and spread over the latent representa-
in the “background” and some, in the “structure” cate-          tion. In Fig.4 concept regions of “structure” and “object”
gory, covering significant part of the image area gener-        classes are shown with the compact classes that allows to
ally produced compact and well-defined concept regions          compare relative scales of the variation in the concept re-
in the form of a two-dimensional surface. These distri-         gions of classes of different categories: top plot: classes 6
butions are illustrated in Fig.3. In the diagram, the top       (sparse) and 3 (compact); bottom plot: classes 7 (sparse)
plot shows distributions of two concepts of “background”        and 2,4 (compact). A clear difference in the character of
type (classes 3 and 4) in the latent space of a trained un-     distributions of different categories can be observed the
supervised model. The surface character of the distribu-        distribution visualizations in Fig.3 and 4. Interestingly,
tions can be clearly observed visually and is confirmed by      while larger scale background-type concepts appear to oc-
PCA analysis of the encoded concept samples that yielded        cupy compact and well-defined region in the latent space
over 80% variance for the two highest components. The           with a small number or single dominant cluster, classes
bottom plot visualizes distributions of several concepts si-    representing local concepts are spread throughout the rep-
multaneously in a compact region of the latent space. In-       resentation space in multiple clusters. A possible expla-
terestingly, the distribution regions of the multiple con-      nation for the latter observation could be that the relation-
cepts, again in the clear form of two-dimensional surfaces,     ship between the explicit higher-level concepts that label
are layered quite closely together rather than being sepa-      the samples in the dataset and the internal or native con-
rated into isolated clusters, as was the case with classes of   cept clusters (3) in unsupervised mode can be more com-
some other categories. In this pattern, concept regions are     plex than one-to-one. For example, an explicit higher-level
stacked closely in the same region of the representation        concept may encompass a number of different native clus-
space like an “onion shell”, a strategy that allows to pack     ters in which case a distribution of the type seen in Fig.4
                                                                Spread, a characteristic size of the region relative to that
                                                                of the general distribution
                                                                Concentration, the number of concept density clusters rel-
                                                                ative to the total number of clusters in the general distribu-
                                                                tion
                                                                Density, the density of the structure measured as the pop-
                                                                ulation per volume in the latent coordinates, relative to the
                                                                density of a uniform distribution.
                                                                Finally, Accuracy for the concept was measured as F1 clas-
                                                                sification score that accounts for classification errors of
                                                                both types. The accuracy of a trained classifier was mea-
Figure 4: Sparse concept distributions in the representa-       sured with multiples batches of randomly selected in- and
tion space                                                      out-of-class test samples. Note that the second value in
                                                                the accuracy column relates to self-learning accuracy that
                                                                will be discussed in the next section. In the results, a clear
can be observed.
Another logical possiblity is that the complexity and depth
of the models used in the study, as well as the size of the     Table 2: Self-learning with unsupervised representations
dataset were not sufficient to identify these more complex
patterns with sufficient confidence. This question requires           Class               Categorizaton      Accuracy
further investigation.
                                                                       Background
                                                                      Trees (2)           0.16, 0.06, 246    0.79, 0.65
3.2   Categorization and Classification                               Field (3)           0.18, 0.06, 357    0.81, 0.72
                                                                      Water (4)           0.19, 0.08, 375    0.84, 0.78
In this section we attempted to establish the relationship            Structure
between the categorization properties of concept distribu-            Roads (5)           0.23, 0.11, 228    0.68, 0.57
tions in the unsupervised representations and the perfor-             Excavations (6)     0.28, 0.14, 292    0.71, 0.54
mance of supervised learning with training data in the rep-           Object
resentation space of a trained model.                                 Vehilcles (7)       0.78, 0.22, 135    0.73, 0.53
As mentioned in the previous sections, the categorizing
ability of an unsupervised model can be evaluated with
two essentially different approaches: first, in a completely    correlation can be seen between the parameters of a con-
unsupervised mode, where the external concept labels are        cept distribution in the representation space and the accu-
not provided with samples in the dataset, the parameters        racy of the concept classifier trained with a labeled sub-
of general distribution can be measured, such as the di-        set of in- and out-of-concept samples in the latent coordi-
mensions, shape, the parameters of density distribution.        nates. It can be seen as another indication, in addition to
These measurements are important because they provide           already mentioned results that unsupervised training, per-
an a priori evaluation of the categorization ability of the     haps under certain conditions and constraints as discussed
model before any knowledge of external semantics such           in [10, 20] can produce configurations of data in the rep-
as known higher-level categories associated with the input      resentation space that are correlated with common higher-
data has been applied. For that reason, these methods can       level concepts in the observable data.
be applied to data of any nature in a truly general manner.
On the other hand, if external labels for a subset of the
                                                                3.3   Self-Learning with Unsupervised
data are available (as was the case in this study), it should
                                                                      Representations
be possible to train a classifier with labeled data in a
supervised mode, but with parameters or “features” be-          As was shown in [9], the structure emergent in the latent
ing the coordinates in the representation space of a pre-       representations as a result of unsupervised training can be
trained model with (4). Comparing the results of the            used in learning of new concepts with minimal data, down
two approaches can indicate how closely the structure that      to counted positive samples. The approach has unsuper-
emerges in the representation as a result of unsupervised       vised and semi-supervised learning phases:
learning reflects the external concepts used in supervised      - in the unsupervised phase that requires no labeled data,
approach.                                                       principlal density clusters with significant population are
The results of measurements of distribution parameters          identified as was outlined earlier in Section 2.3, (4); these
and the accuracy of classification for for selected concepts    structures can be seen as principal native concepts in the
for each of the scale types are presented in Table 2. The       observable data;
parameters of the concept distribution region in the latent     - in the semi-supervised self-learning phase that follows, a
space were defined ([9]) as:                                    small number of positive concept samples is used to tag or
mark the clusters that can be associated with the concept        growing evidence that such representations can play an
being learned, and creating a small labeled dataset from         important role in processing of sensory data by biologic
the genuine concept samples and those obtained from the          systems. Recent results [13, 14] demonstrated that effec-
unsupervised cluster distribution;                               tive representations of sensory data such as images and
- then a binary concept classifier is trained with the dataset   smells can be produced with a small number of active neu-
and can be used for prediction of the concept being learned      rons in biologic neural networks. Linking these results
for new samples in the input space. Because the genuine          with the findings in this work, where the examples of such
labeled samples are used only for tagging of clusters of in-     low-dimensional representations created artificially were
terest, the method can indeed work with very minimal sets        investigated, one can hypothesize that perhaps, the repre-
of labeled concept data, down to a single, “signal” sample.      sentations in more complex sparse neural networks even
In this section the single sample self-learning based on         of a massive kind [7] can be modeled as a set or a “stack”
unsupervised density structure was applied to the image          of low dimensional representation regions indexed by the
dataset, with results for representative classes in each cate-   combination of neurons that collectively participate in cre-
gory presented in Table 2, the second value in the accuracy      ating the latent representation, with surface-like concept
column.                                                          regions observed in our results, distributed in them (Fig.5).
These results show that the concepts with compact repre-         In such a stacked representation, a concept region for ex-
sentations were learned successfully with a single sample
of the concept, while those with more spread and sparse
representation achieved only marginally better resolution
compared to the random strategy. A possible explanation
for this effect can be found in the analysis of the distri-
bution patterns for the concepts in Fig.3, 4. If the repre-
sentation image of a higher-level concept comprises sev-
eral native clusters, the datapoints generated in the vicinity
of the signal sample wouldn’t sufficiently cover the entire      Figure 5: Concept regions in a sparse latent representation
distribution region of the concept in the latent space and
the resolution of the classifier would be reduced. This was      ample, “cats” can be indexed by the indices of activated
confirmed by further experiments where it was observed           neurons Wk and the index Sk of the concept surface in the
that increasing the size of the learning sample for this type    representation subregion of Wk : Icats = (Wcats , Scats ).
of concepts substantially improves the results of learning.      Thus, prototypes of native concepts in the observable data
Unlike traditional in machine learning supervised meth-          can form in an unsupervised observation of the environ-
ods, learning with unsupervsed denstity structure or den-        ment via self-learning with minimization of error of regen-
sity landscape is more reminiscent of the learning pro-          eration, requiring minimal supervision and prior knowl-
cesses in the biologic systems that are often spontaneous,       edge of the domain.
flexible and require minimal data with building accu-            Analysing concept distributions in the representations of
racy gradualty over a sequence of learning iterations.           deep learning models can offer a novel perspective on the
Landscape-based learning can imitate such processes by           program of Explainable AI [21]. Much effort has been in-
testing concept distribution regions in an iterative trial and   vested by the research community in attempts to describe
error process as and when learning data becomes available        the learning configurations and rules that emerge in com-
in a close interaction with the environment.                     plex and deep learning models in training. Understanding
                                                                 the native structure of information in the latent representa-
                                                                 tion created in training can offer a different, and in some
4   Conclusion                                                   cases, very visual interpretation of learning processes in
                                                                 these systems.
                                                                 All in all, it is believed that the study of native categoriza-
The analysis of higher-level concept distributions of image
                                                                 tion properties of the generative models may lead to better
data in the latent space of self-learning models presented
                                                                 understanding of the underlying principles of self-learning
in this work is in agreement with the earlier findings that
                                                                 and development of models that could learn in more natu-
unsupervised training of models with self-encoding and
                                                                 ral way [16], closer to the spontaneous and iterative learn-
regeneration can lead to emergence of identifiable struc-
                                                                 ing processes in biologic systems.
ture in the latent representation that can be correlated with
higher-level concepts in the observable data.
Correlation of classification accuracy with the categoriza-      Acknowledgements
tion parameters of the concept distributions in the latent
space of such models was shown now with data of differ-          The author is grateful to Prof. Pilip Prystavka, Chair of
ent types and nature [7, 10, 9] pointing at the possibility of   the Applied Mathematics, National Aviation University
a general character of this effect.                              (Kyiv) for valuable discussions of the findings and the op-
Low-dimensional representations can be of interest due to        portunity to use the dataset of images used in this work.
References                                                              [20] Dolgikh, S.: Why good generative models categorize. Int.
                                                                            Journ. Mod. Edu. Comp. Sci. (2020) (to appear)
                                                                        [21] Gilpin L.H., Bau D., Yuan B.Z. et al.: Explaining expla-
[1] Hinton, G., Osindero, S., Teh Y.W.: A fast learning algo-               nations: an overview of interpretability of machine learning.
    rithm for deep belief nets. Neural Comp. 18(7) (2006) 1527–             arXiv 1806.00069 (2018)
    1554
[2] Fischer A., Igel C.: Training restricted Boltzmann machines:
    an introduction. Pattern Recogn. 47 (2014) 25–39
[3] Bengio Y.: Learning deep architectures for AI. Found.
    Trends Machine Learning 2(1) (2009) 1–127
[4] Coates, A., Lee, H., Ng, A.Y.: An analysis of single-layer
    networksin unsupervised feature learning. Proc. 14th Intl.
    Conf. on Artificial Intelligence and Statistics 15 (2011) 215–
    223
[5] Ranzato, M.A., Boureau Y.-L., Chopra, S., LeCun, Y.: A
    unified energy-based framework for unsupervised learning.
    Proc. 11th Intl. Conf. on Artificial Intelligence and Statistics,
    2 (2007) 371–379
[6] Friston, K.: A free energy principle for biological systems.
    Entropy 14 (2012) 2100–2121
[7] Le, Q.V., Ransato, M. A., Monga R., et al. Building high-
    level features using large scale unsupervised learning. arXiv
    1112.6209 (2012)
[8] Banino, C., Barry, D., Kumaran D.: Vector-based naviga-
    tion using grid-like representations in artificial agents. Na-
    ture 557 (2018) 429–433
[9] Dolgikh, S.: Categorized representations and general learn-
    ing. Proc.10th Intl. Conf. on Theory and Application of Soft
    Computing, Computing with Words and Perceptions 1095
    (2019) 93–100
[10] Higgins, I., Matthey, L., Glorot, X., Pal, A., et al.: Early
    visual concept learning with unsupervised deep learning.
    arXiv 1606.05579 (2016)
[11] Shi, J., Xu, J., Yao, Y., and Xu, B.: Concept learn-
    ing through deep reinforcement learning with memory-
    augmented neural networks. Neural Networks 110 (2019)
    47–54
[12] Rodriguez, R. C., Alaniz, S., and Akata, Z.: Modeling con-
    ceptual understanding in image reference games. In: Ad-
    vances in Neural Information Proc. Syst. (Vancouver, BC)
    (2019) 13155–13165
[13] Yoshida, T., Ohki, K.: Natural images are reliably repre-
    sented by sparse and variable populations of neurons in vi-
    sual cortex. Nature Communications 11 (2020) 872
[14] Bao, X., Gjorgiea, E., Shanahan, L.K. et al.: Grid-like
    neural representations support olfactory navigation of a two-
    dimensional odor space. Neuron 102(5) (2019) 1066–1075
[15] Hornik, K., Stinchcombe M., White H.: Multilayer feed-
    forward neural networks are universal approximators. Neu-
    ral Networks, 2(5), (1989) 359–366
[16] Hassabis, D., Kumaran, D., Summerfield C. et al.: Neuro-
    science inspired Artificial Intelligence. Neuron 95(2) (2017)
    245–258
[17] Keras: Python deep learning library. https://keras.io/
[18] Fukunaga, K., Hostetler, L.D.: The estimation of the gradi-
    ent of a density function, with applications in pattern recog-
    nition. IEEE Trans. Inf. Theory bf 21(1) (1975) 32–40
[19] Zhou, X., Belkin M.: Semi-supervised learning. In: Acad.
    Press Lib. in Signal Proc. Elsevier (2014) 1239–1269

</pre>