=Paper= {{Paper |id=Vol-3611/paper4 |storemode=property |title=Sparse generative representations of handwritten digits |pdfUrl=https://ceur-ws.org/Vol-3611/paper4.pdf |volume=Vol-3611 |authors=Serge Dolgikh |dblpUrl=https://dblp.org/rec/conf/ivus/Dolgikh22 }} ==Sparse generative representations of handwritten digits== https://ceur-ws.org/Vol-3611/paper4.pdf
                         Sparse generative representations of handwritten digits
                         Serge Dolgikh
                         National Aviation University, 1 Lubomyra Huzara Ave, 03058 Kyiv, Ukraine


                                          Abstract
                                          We investigated the process of unsupervised generative learning and the structure of
                                          informative generative representations of images of handwritten digits (MNIST dataset).
                                          Learning models with the architecture of sparse convolutional autoencoder with constraints to
                                          produce low-dimensional representations achieved successful generative learning demonstrated
                                          by high accuracy of generation of images. A well-defined, continuous and connected structure
                                          of generative representations was observed and described. Structured informative
                                          representations of unsupervised generative models can be an effective platform for
                                          investigation of origins of intelligent behaviors in artificial and biological learning systems.

                                          Keywords 1
                                          Artificial neural networks, generative machine learning, representation learning, clustering


                         1. Introduction                                                                     [8], images [6,7,9], as well as a number of other
                                                                                                             results with different types of data and
                                                                                                             applications [10,11].
                             Representation learning with the objective to
                                                                                                                These results demonstrated that structure that
                         identify informative elements in the observable
                                                                                                             emerges in the latent representations created by
                         data has a well-established record in machine
                                                                                                             models of generative learning in the process of
                         learning. Informative representations were
                                                                                                             unsupervised self-learning with minimization of
                         obtained with Restricted Boltzmann Machines
                                                                                                             generative error can have intrinsic associations
                         (RBM), Deep Belief Networks (DBN) [1, 2],
                                                                                                             with characteristics patterns in the observable data
                         different flavors of autoencoders [3] and other
                                                                                                             and perhaps, can be used as a foundation for
                         models allowed to improve accuracy of
                                                                                                             learning methods and processes that use these
                         supervised learning [4]. The relations between
                                                                                                             associations for improved efficiency.
                         learning and statistical thermodynamics were
                                                                                                                Interestingly,     these      observations     in
                         studied in [5] and other works leading to
                                                                                                             unsupervised machine learning were paralleled in
                         understanding of a deep connection between
                                                                                                             the recent works with a number of results in
                         learning processes and principles of information
                                                                                                             biologic sensory networks [12,13] that
                         theory and statistics.
                                                                                                             demonstrated commonality of low-dimensional
                             In the experimental studies, a range of results
                                                                                                             representations in processing of sensory
                         was reported, such as the “cat experiment” that
                                                                                                             information by mammals, including humans.
                         demonstrated spontaneous emergence of concept
                                                                                                                These previous findings prompted and
                         sensitivity on a single neuron level in
                                                                                                             stimulated an investigation into the process of
                         unsupervised deep learning with image data [6].
                                                                                                             production and essential characteristics of low-
                         Disentangled representations were produced and
                                                                                                             dimensional informative latent representations
                         discussed [7] with a deep variational autoencoder
                                                                                                             obtained with neural network models of
                         and different types of data pointing at the
                                                                                                             unsupervised generative self-learning, including
                         possibility of a general nature of the effect.
                                                                                                             formation of a conceptual structure in the latent
                         Concept-associated structure was observed in
                         latent representations of Internet network traffic

                         IVUS 2022: 27th International Conference on Information
                         Technology, May 12, 2022, Kaunas, Lithuania
                         EMAIL: sdolgikh@nau.edu.ua (S. Dolgikh)
                         ORCID: 0000-0001-5929-8954 (S. Dolgikh)
                                      ©️ 2022 Copyright for this paper by its authors. Use permitted under
                                      Creative Commons License Attribution 4.0 International (CC BY 4.0).

                                      CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
representations of the sensory environment of the       2. Methods and data
learner.
    The questions investigated in this work were        2.1. Model architecture
the following: what are the characteristics of the
latent representations of successful generative             A convolutional autoencoder model [15] used
models? Is there an association between the             in this work had the encoding stage with
characteristic patterns (or higher-level concepts)      convolution-pooling layers followed by several
in the input data and latent distributions produced     layers of dimensionality reduction with a sparse
by learning models?                                     encoding layer of size 20–25 producing an
    What structure can be identified in the latent      effective low-dimensional latent representation
representations with entirely unsupervised              described by activations of neurons in the
methods, without prior knowledge of conceptual          encoding layer.
content of the input data?                                  Sparse training penalty was applied to latent
    These questions were approached with                activations as L1 regularization, resulting in 2 to
generative models of deep neural network                4 neuron activations for most images in the
architecture and a dataset of images of real,           dataset. The decoding / generative stage that was
unprocessed image data of handwritten images            fully symmetrical to the encoder. The diagram of
(MNIST dataset) used widely in the studies of           the architecture used in this work is shown in
machine intelligence systems. The intent of the         Figure 1.
study is to understand how successful common
generative models of unsupervised self-learning
even of limited complexity, can produce
informative and structured representations of
input data modeling sensory environments.
    The novelty of the presented approach is
associated with using “generic” generative
architecture with clearly defined directions of         Figure 1: Sparse convolutional autoencoder
possible incremental variation and evolution.           model
Using this type of architecture can provide                 Overall, the models had 21 layers and ~ 9 × 104
answers to essential questions of how complex           trainable parameters. The models were
architectures that were reported in the cited results   implemented in Keras / Tensorflow programming
could have developed in realistic learning              package [16] and trained for minimization of
systems.                                                generative error i.e., an average norm of the
    Throughout the work, externally known types         difference between input images and the output
or patterns in the input data that models               produced by the model on the unsupervised
observable sensory environment of a learning            training set defined by the categorical cross-
system will be referred to as “higher-level             entropy (CCE) cost function.
concepts” or “external concepts”, that signify a
class of a sample in the input space that is defined
by an external process, outside of the model. An
                                                        2.2.    Data
example of an external concept for an image with
a geometric shape can be word “triangle” or a               The dataset of images used in the study,
specific symbol. In contrast, structures in the         MNIST [17] consisted of three sets of images
latent representations of the observable space that     (training, validation and test) of handwritten
can be identified entirely by unsupervised means        digits, from 0 to 9 produced by different real
without any external or prior information, will be      individuals. The models were trained on a subset
referred to as “internal” “natural” or “native”         of 10,000 images, with approximately equal
concepts [14].                                          representation of all digits.
    A priori, there is no reason to assume that             To ensure entirely unsupervised character of
external and native concepts are related or             latent representations created by trained models,
correlated, so the relation between the external        labeled samples were not used in the phase of
and native concepts is an interesting and               generative training of the models, but only in the
intriguing question in its own right.                   analysis of distributions of higher-level concepts
                                                        in the latent representations created by trained
                                                        models.
2.3.    Training                                        index (1, 3, 8) with coordinates (0.011, 0.017,
                                                        0.019) that translates to corresponding activations
                                                        of the neurons 1, 3 and 8 in the latent layer, and
    The success of unsupervised learning was
                                                        nil activations of other latent neurons.
measured by the characteristics of training
performance and generative ability. Training
performance was measured by the reduction in the        3. Results
value of the cost function over the period of
training. Generative performance was evaluated             The results in this section were obtained with
visually based on the quality of generation of a        several instances of models trained as outlined
subset of images in the training dataset.               earlier, that were successful in generative
Approximately 70% of models were successful in          learning. The results pertain to essential
generative learning by both measures. A clear           characteristics of low-dimensional latent
correlation was observed between the training and       representations produced by generative modes,
generative characteristics. Models with training        such as structure, topology, consistency and
loss above certain threshold generally did not          others.
succeed in acquiring good generative ability.
    Success of generative learning, that is, the        3.1.    Generative latent structure
ability to generate high quality images of the types
present in the training dataset indicated that latent
representations produced by the learning models             Examination of the geometrical and
retained significant information about the               topological structure of sparse representations of
distribution of observable data represented by the      the handwritten digit images produced by
training dataset.                                       generative models confirmed highly structured
                                                        character of representations closely correlated
                                                        with characteristic types of images.
2.4.    Encoding and generation                             Following the objective of the study to
                                                        examine the structure of informative generative
   A trained model can perform two essential            representations without known concept samples,
transformations of data: encoding, E(x) from the        an approach was developed that allows to
observable space, i.e., image x to the latent           investigate the structure in the latent
position l; and generative, G(l) in the opposite        representations produced by successfully learned
direction, producing an observable image, y. The        generative models by purely unsupervised
objective of generative learning is to minimize the     methods that do not require knowledge of the
distance between training images and their              semantics, concept, class or any other prior
generations by the model, defined by a training         information about the input data. The process of
metric (cost function) in the observable space.         producing such unsupervised structure (or
                                                        “generative landscape” of the representation) is
2.5.    Sparse representations                          based on identification of a density structure, such
                                                        as density clusters in a general sample of encoded
                                                        sensory inputs with methods of unsupervised
    As a result of a sparsity constraint imposed in
                                                        density clustering such as MeanShift [19].
unsupervised generative training, the effective
                                                            The approach is based on several essential
latent representations of observable images were
                                                        assumptions. The first one is success of generative
low dimensional, that is, an observable image was
                                                        learning reflected by sufficient accuracy and
described by activations of a small number of
                                                        quality of generation. The second is sparsity of
latent neurons; the observed effective
                                                        resulting representations, that provides two
dimensionality with the images in the dataset was
                                                        essential benefits: a lower dimensionality of the
2 to 4 (i.e., two to four non-zero activations of
                                                        encoded inputs, and higher decoupling in the
latent neurons).
                                                        structure of representations making it easier to
    A sparse latent representation of this type can
                                                        detect and harness for learning. And finally, an
be described by a stacked space of low-
                                                        assumption on the composition of the training set
dimensional “slices” [18], indexed by a tuple of
                                                        to contain a constant number of characteristic
activated neurons, (i1, i2, i3). For example, an
                                                        types of inputs (i.e., representativity).
image of digit “2” can be described in a 24-
                                                            To apply methods of density clustering in the
dimensional sparse representation space by the
                                                        latent representation, first a structure of space
slices needs to be identified (Section 2.5). This             As can be observed in the visualization of
was done according to the following process:               Figure 2, cluster positions were indeed closely
    •     For each three-dimensional slice: l = (i1,       associated with characteristic types of images in
    i2, i3) a subset of significant activations S(l)       the training dataset.
    identified as Σ aj ≥ f × amax, where amax:
    maximum activation in the slice (the sum of            3.2.    Latent geometry and topology
    activations of slice neurons); f: a factor, f =
    0.25 in the study.
                                                               The identified landscape of density structure
    •     S(l) projected on the slice coordinates,         can assist in examination of the geometry and
    resulting in a three-dimensional set Sp(l).
                                                           topology of the sparse latent space.
    •     A density clustering method applied on               The first objective was to investigate
    the set Sp(l) producing a sequence of density          connectedness and continuity of the latent regions
    clusters ordered by size D(l) = { Dk(l) }. The         associated with characteristic types of observable
    length of the sequence is defined by the               images. To this end, arrays of random positions
    clustering method and does not have to be              were created on the spheres of a given radius from
    known in advance.                                      the cluster centers, thus producing a “flow” of
    •     The process is repeated for slices with          latent positions from cluster centers of the
    significant representation of significant              landscape outwards. The positions were then
    activations (i.e., the size of S(l) above certain      transformed to observable space with generative
    threshold, relative to other slices) resulting in      transformation, as in the previous section
    a stacked structure of density clusters,               producing arrays of observable images associated
    generative landscape D = { D(l) }, with a              with the latent positions.
    natural two-dimensional unique index of (l, n);            Examination of the resulting images allowed
    l: the position of the slice; n: the position of the   to conclude that generative representations
    cluster in the slice                                   produced by models were indeed connected and
    With the generative landscape produced with            continuous, with well-defined regions associated
the described process, the first task was to               with specific types of images (Figure 3).
examine how the resulting latent structure is
correlated with characteristic types of data in the
training dataset. It can be determined by
transforming center positions of the clusters of the
landscape D(l) to observable images with the
generative transformation G(l) (Section 2.4).
Figure 2 shows the resulting “map” of images
associated with the identified density structure of
the generative landscape.                                  Figure 3: Generative latent landscape, continuity
                                                               Examination of different clusters and
                                                           landscapes produced by different individual
                                                           models allows to conclude that consistency and
                                                           connectedness is a general property of generative
                                                           latent landscapes.

                                                           3.3. Structural     consistency               of
                                                           latent representations
                                                               While latent representations created by
                                                           generative models can be expected to be specific
                                                           to individual learning models due to peculiarities
                                                           of the training process, for example, random
                                                           selection of training samples. At the same time,
Figure 2: Generative structure of the latent               some essential characteristics of generative
landscape (vertical axis: slice; horizontal axis:          representations appeared to be consistent between
cluster (first 15 clusters)                                the learning models.
    To investigate consistency of the latent           3.4.    Unsupervised concept learning
structure, an analysis of latent landscapes
produced with three independently trained
                                                           The results of the preceding sections, with
generative models was performed.
                                                       strong correlations observed between the
    The models were trained over 40-60 epochs of
                                                       emergent latent structure of successful generative
unsupervised generative learning with a training
                                                       models and characteristic types of observable data
set of 10,000 samples, achieving a training plateau
                                                       can be interpreted as distillation of “native” or
at validation loss of 0.12-0.14 (with the starting
                                                       “natural” concepts in the observable data in the
value of ~ 0.7) and good to excellent generative
                                                       process of unsupervised learning with
performance on a subset of images and were not
                                                       minimization of generative error. The structure or
selected by any specific criteria. After completion
                                                       the latent landscape, as discussed in the preceding
of the training phase several successful
                                                       sections, can be resolved in an entirely
independently trained models were selected and
                                                       unsupervised process by a number of methods.
characteristics of generative landscapes produced
                                                           It can be concluded from these results that
with methods described earlier measured.
                                                       generative learning under certain constraints and
    The measured characteristics were: the overall
                                                       the resulting structure in the informative latent
size of the landscape as the number of identified
                                                       representations can be used as a foundation for
density clusters with population above certain
                                                       implicit learning of characteristic patterns in the
margin, relative to the size of the training dataset
                                                       observable data before and without external
(~ 2%); recognition, the fraction of the landscape
                                                       contextual information about it. These results can
clusters associated with recognizable digits (as
                                                       also offer insights into explainability of learning
discussed in Section 3.1), indicating a correlation
                                                       in generative models via association of learned
of the landscape with the characteristic content of
                                                       concepts or classes in the observable data and the
the training set; representativity of the content of
                                                       native information structure that emerges in the
the landscape, such as presence of all types of
                                                       latent representations in the process of
digits (completeness) and distribution of digits
                                                       unsupervised generative learning.
between slices and clusters (digits with highest
and lowest population of associated clusters in the
landscape). The results are presented in Table 1.      4. Discussion
Table 1
Consistency of latent structure                            Highly structured character of low-
  Model Size         Recog Complet Populati            dimensional generative representations produced
                     nition     eness      on: h / l   by successful models of unsupervised generative
                                                       self-learning observed in this work provides
     A       474     0.973       True     0,7 / 4,6
                                                       further support for a growing number of results
     B       396     0.975       True     0,7 / 2,5    pointing at importance of informative
     C       485     0.971       True     0,4 / 2,8    representations in processing of sensory
                                                       information by learning systems, of both artificial
   As can be inferred from these results, latent       and biological nature.
landscapes of independently trained successful             In this work the effect was observed with real-
generative models had significant consistency in       world image data of significant complexity,
the size, recognition and representation of            pointing at a general character of the effect.
characteristic types of images. On the other hand,     Informative structured representations strongly
factors such as distribution of digits in the slices   correlated with characteristic patterns, or concepts
and clusters, highest and lowest representation of     in the sensory data can play an essential role in
digits in the clusters and a number of others          emergence and development of intelligent
tended to be more specific to individual learning      behaviors including conceptual intelligence,
models.                                                abstraction and communications.
   Similar results were previously obtained with           Continuing research in this direction can shed
several different types of image data such as          light on common principles of learning for
geometrical shapes [9] pointing at the likelihood      artificial and biological systems and perhaps point
of a general character of the observed effect of       a direction to a generation of learning systems
categorization in the latent representations of        capable of more natural and intuitive learning
successful generative models by characteristic         from direct interaction with the sensory
types of patterns.                                     environment [20].
5. References                                              Systems, Vancouver, Canada, 2019 13155–
                                                           13165.
                                                      [12] T. Yoshida, K. Ohki, Natural images are
[1] G. Hinton, S. Osindero, Y.W. Teh, A fast
                                                           reliably represented by sparse and variable
     learning algorithm for deep belief nets,
                                                           populations of neurons in visual cortex,
     Neural Computation 18(7) (2006) 1527–
                                                           Nature Communications 11 (2020) 872.
     1554.
                                                      [13] X. Bao, E. Gjorgiea, L.K. Shanahan et al.,
[2] A. Fischer, C., Igel, Training restricted
                                                           Grid-like neural representations support
     Boltzmann machines: an introduction,
                                                           olfactory navigation of a two-dimensional
     Pattern Recognition 47 (2014) 25–39.
                                                           odor space, Neuron 102 (5) (2019) 1066–
[3] Y. Bengio, Learning deep architectures for
                                                           1075.
     AI, Foundations and Trends in Machine
                                                      [14] E.H. Rosch, Natural categories, Cognitive
     Learning 2(1) (2009) 1–127.
                                                           Psychology 4 (1973) 328–350.
[4] A. Coates, H. Lee, A.Y. Ng, An analysis of
                                                      [15] Q.V. Le, A tutorial on deep learning:
     single-layer networks in unsupervised
                                                           autoencoders, convolutional neural networks
     feature learning, in: Proceedings of 14th
                                                           and recurrent neural networks, Stanford
     International Conference on Artificial
                                                           University 2015.
     Intelligence and Statistics 15 (2011) 215–
                                                      [16] Keras: Python deep learning library, last
     223.
                                                           accessed: 2020/11 URL: https://keras.io.
[5] M.A Ranzato, Y.-L. Boureau, S. Chopra, Y.
                                                      [17] Y. Le Cun, The MNIST database of
     LeCun, A unified energy-based framework
                                                           handwritten digits, Courant Institute, NYU
     for unsupervised learning, in: Proceedings of
                                                           Corinna Cortes, Google Labs, New York
     11th International Conference on Artificial
                                                           Christopher J.C. Burges, Microsoft Research
     Intelligence and Statistics 2, 2007, 371–379.
                                                           2007 Redmond, USA.
[6] Q.V. Le, M.A. Ranzato, R. Monga et al.,
                                                      [18] S. Dolgikh, Low-dimensional represent-
     Building high level features using large scale
                                                           tations in unsupervised generative models,
     unsupervised learning, arXiv 1112.6209
                                                           in: Proceedings of 20th International
     (2012).
                                                           Conference Information Technologies –
[7] I. Higgins, L. Matthey, X. Glorot, A. Pal et
                                                           Applications and Theory (ITAT), Slovakia,
     al., Early visual concept learning with
                                                           2020 239–245.
     unsupervised      deep      learning,   arXiv
                                                      [19] K. Fukunaga, L.D. Hostetler, The estimation
     1606.05579 (2016).
                                                           of the gradient of a density function, with
[8] N. Seddigh, B. Nandy, D. Bennett, Y. Ren,
                                                           applications in pattern recognition, IEEE
     S. Dolgikh et al., A framework & system for
                                                           Transactions on Information Theory 21 (1)
     classification of encrypted network traffic
                                                           (1975) 32–40.
     using Machine Learning, in: Proceedings of
                                                      [20] D. Hassabis, D. Kumaran, C. Summerfield,
     15th International Conference on Network
                                                           M.      Botvinick,      Neuroscience-inspired
     and Service Management (CNSM), Halifax,
                                                           Artificial Intelligence, Neuron 95(2) (2017)
     Canada, 2019, 1–5.
                                                           245–258.
[9] S. Dolgikh, Topology of conceptual
     representations in unsupervised generative
     models, in: Proceedings of 26th International
     Conference on Information Society and
     University Studies, Kaunas, Lithuania, 2021,
     150–157.
[10] J. Shi, J. Xu, Y. Yao, B. Xu, Concept
     learning through deep reinforcement
     learning with memory augmented neural
     networks, Neural Networks 110 (2019) 47–
     54.
[11] R.C. Rodriguez, S. Alaniz, Z. Akata,
     Modeling conceptual understanding in image
     reference games, in: Proceedings of
     Advances in Neural Information Processing