=Paper=
{{Paper
|id=Vol-3611/paper4
|storemode=property
|title=Sparse generative representations of handwritten digits
|pdfUrl=https://ceur-ws.org/Vol-3611/paper4.pdf
|volume=Vol-3611
|authors=Serge Dolgikh
|dblpUrl=https://dblp.org/rec/conf/ivus/Dolgikh22
}}
==Sparse generative representations of handwritten digits==
Sparse generative representations of handwritten digits Serge Dolgikh National Aviation University, 1 Lubomyra Huzara Ave, 03058 Kyiv, Ukraine Abstract We investigated the process of unsupervised generative learning and the structure of informative generative representations of images of handwritten digits (MNIST dataset). Learning models with the architecture of sparse convolutional autoencoder with constraints to produce low-dimensional representations achieved successful generative learning demonstrated by high accuracy of generation of images. A well-defined, continuous and connected structure of generative representations was observed and described. Structured informative representations of unsupervised generative models can be an effective platform for investigation of origins of intelligent behaviors in artificial and biological learning systems. Keywords 1 Artificial neural networks, generative machine learning, representation learning, clustering 1. Introduction [8], images [6,7,9], as well as a number of other results with different types of data and applications [10,11]. Representation learning with the objective to These results demonstrated that structure that identify informative elements in the observable emerges in the latent representations created by data has a well-established record in machine models of generative learning in the process of learning. Informative representations were unsupervised self-learning with minimization of obtained with Restricted Boltzmann Machines generative error can have intrinsic associations (RBM), Deep Belief Networks (DBN) [1, 2], with characteristics patterns in the observable data different flavors of autoencoders [3] and other and perhaps, can be used as a foundation for models allowed to improve accuracy of learning methods and processes that use these supervised learning [4]. The relations between associations for improved efficiency. learning and statistical thermodynamics were Interestingly, these observations in studied in [5] and other works leading to unsupervised machine learning were paralleled in understanding of a deep connection between the recent works with a number of results in learning processes and principles of information biologic sensory networks [12,13] that theory and statistics. demonstrated commonality of low-dimensional In the experimental studies, a range of results representations in processing of sensory was reported, such as the “cat experiment” that information by mammals, including humans. demonstrated spontaneous emergence of concept These previous findings prompted and sensitivity on a single neuron level in stimulated an investigation into the process of unsupervised deep learning with image data [6]. production and essential characteristics of low- Disentangled representations were produced and dimensional informative latent representations discussed [7] with a deep variational autoencoder obtained with neural network models of and different types of data pointing at the unsupervised generative self-learning, including possibility of a general nature of the effect. formation of a conceptual structure in the latent Concept-associated structure was observed in latent representations of Internet network traffic IVUS 2022: 27th International Conference on Information Technology, May 12, 2022, Kaunas, Lithuania EMAIL: sdolgikh@nau.edu.ua (S. Dolgikh) ORCID: 0000-0001-5929-8954 (S. Dolgikh) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings representations of the sensory environment of the 2. Methods and data learner. The questions investigated in this work were 2.1. Model architecture the following: what are the characteristics of the latent representations of successful generative A convolutional autoencoder model [15] used models? Is there an association between the in this work had the encoding stage with characteristic patterns (or higher-level concepts) convolution-pooling layers followed by several in the input data and latent distributions produced layers of dimensionality reduction with a sparse by learning models? encoding layer of size 20–25 producing an What structure can be identified in the latent effective low-dimensional latent representation representations with entirely unsupervised described by activations of neurons in the methods, without prior knowledge of conceptual encoding layer. content of the input data? Sparse training penalty was applied to latent These questions were approached with activations as L1 regularization, resulting in 2 to generative models of deep neural network 4 neuron activations for most images in the architecture and a dataset of images of real, dataset. The decoding / generative stage that was unprocessed image data of handwritten images fully symmetrical to the encoder. The diagram of (MNIST dataset) used widely in the studies of the architecture used in this work is shown in machine intelligence systems. The intent of the Figure 1. study is to understand how successful common generative models of unsupervised self-learning even of limited complexity, can produce informative and structured representations of input data modeling sensory environments. The novelty of the presented approach is associated with using “generic” generative architecture with clearly defined directions of Figure 1: Sparse convolutional autoencoder possible incremental variation and evolution. model Using this type of architecture can provide Overall, the models had 21 layers and ~ 9 × 104 answers to essential questions of how complex trainable parameters. The models were architectures that were reported in the cited results implemented in Keras / Tensorflow programming could have developed in realistic learning package [16] and trained for minimization of systems. generative error i.e., an average norm of the Throughout the work, externally known types difference between input images and the output or patterns in the input data that models produced by the model on the unsupervised observable sensory environment of a learning training set defined by the categorical cross- system will be referred to as “higher-level entropy (CCE) cost function. concepts” or “external concepts”, that signify a class of a sample in the input space that is defined by an external process, outside of the model. An 2.2. Data example of an external concept for an image with a geometric shape can be word “triangle” or a The dataset of images used in the study, specific symbol. In contrast, structures in the MNIST [17] consisted of three sets of images latent representations of the observable space that (training, validation and test) of handwritten can be identified entirely by unsupervised means digits, from 0 to 9 produced by different real without any external or prior information, will be individuals. The models were trained on a subset referred to as “internal” “natural” or “native” of 10,000 images, with approximately equal concepts [14]. representation of all digits. A priori, there is no reason to assume that To ensure entirely unsupervised character of external and native concepts are related or latent representations created by trained models, correlated, so the relation between the external labeled samples were not used in the phase of and native concepts is an interesting and generative training of the models, but only in the intriguing question in its own right. analysis of distributions of higher-level concepts in the latent representations created by trained models. 2.3. Training index (1, 3, 8) with coordinates (0.011, 0.017, 0.019) that translates to corresponding activations of the neurons 1, 3 and 8 in the latent layer, and The success of unsupervised learning was nil activations of other latent neurons. measured by the characteristics of training performance and generative ability. Training performance was measured by the reduction in the 3. Results value of the cost function over the period of training. Generative performance was evaluated The results in this section were obtained with visually based on the quality of generation of a several instances of models trained as outlined subset of images in the training dataset. earlier, that were successful in generative Approximately 70% of models were successful in learning. The results pertain to essential generative learning by both measures. A clear characteristics of low-dimensional latent correlation was observed between the training and representations produced by generative modes, generative characteristics. Models with training such as structure, topology, consistency and loss above certain threshold generally did not others. succeed in acquiring good generative ability. Success of generative learning, that is, the 3.1. Generative latent structure ability to generate high quality images of the types present in the training dataset indicated that latent representations produced by the learning models Examination of the geometrical and retained significant information about the topological structure of sparse representations of distribution of observable data represented by the the handwritten digit images produced by training dataset. generative models confirmed highly structured character of representations closely correlated with characteristic types of images. 2.4. Encoding and generation Following the objective of the study to examine the structure of informative generative A trained model can perform two essential representations without known concept samples, transformations of data: encoding, E(x) from the an approach was developed that allows to observable space, i.e., image x to the latent investigate the structure in the latent position l; and generative, G(l) in the opposite representations produced by successfully learned direction, producing an observable image, y. The generative models by purely unsupervised objective of generative learning is to minimize the methods that do not require knowledge of the distance between training images and their semantics, concept, class or any other prior generations by the model, defined by a training information about the input data. The process of metric (cost function) in the observable space. producing such unsupervised structure (or “generative landscape” of the representation) is 2.5. Sparse representations based on identification of a density structure, such as density clusters in a general sample of encoded sensory inputs with methods of unsupervised As a result of a sparsity constraint imposed in density clustering such as MeanShift [19]. unsupervised generative training, the effective The approach is based on several essential latent representations of observable images were assumptions. The first one is success of generative low dimensional, that is, an observable image was learning reflected by sufficient accuracy and described by activations of a small number of quality of generation. The second is sparsity of latent neurons; the observed effective resulting representations, that provides two dimensionality with the images in the dataset was essential benefits: a lower dimensionality of the 2 to 4 (i.e., two to four non-zero activations of encoded inputs, and higher decoupling in the latent neurons). structure of representations making it easier to A sparse latent representation of this type can detect and harness for learning. And finally, an be described by a stacked space of low- assumption on the composition of the training set dimensional “slices” [18], indexed by a tuple of to contain a constant number of characteristic activated neurons, (i1, i2, i3). For example, an types of inputs (i.e., representativity). image of digit “2” can be described in a 24- To apply methods of density clustering in the dimensional sparse representation space by the latent representation, first a structure of space slices needs to be identified (Section 2.5). This As can be observed in the visualization of was done according to the following process: Figure 2, cluster positions were indeed closely • For each three-dimensional slice: l = (i1, associated with characteristic types of images in i2, i3) a subset of significant activations S(l) the training dataset. identified as Σ aj ≥ f × amax, where amax: maximum activation in the slice (the sum of 3.2. Latent geometry and topology activations of slice neurons); f: a factor, f = 0.25 in the study. The identified landscape of density structure • S(l) projected on the slice coordinates, can assist in examination of the geometry and resulting in a three-dimensional set Sp(l). topology of the sparse latent space. • A density clustering method applied on The first objective was to investigate the set Sp(l) producing a sequence of density connectedness and continuity of the latent regions clusters ordered by size D(l) = { Dk(l) }. The associated with characteristic types of observable length of the sequence is defined by the images. To this end, arrays of random positions clustering method and does not have to be were created on the spheres of a given radius from known in advance. the cluster centers, thus producing a “flow” of • The process is repeated for slices with latent positions from cluster centers of the significant representation of significant landscape outwards. The positions were then activations (i.e., the size of S(l) above certain transformed to observable space with generative threshold, relative to other slices) resulting in transformation, as in the previous section a stacked structure of density clusters, producing arrays of observable images associated generative landscape D = { D(l) }, with a with the latent positions. natural two-dimensional unique index of (l, n); Examination of the resulting images allowed l: the position of the slice; n: the position of the to conclude that generative representations cluster in the slice produced by models were indeed connected and With the generative landscape produced with continuous, with well-defined regions associated the described process, the first task was to with specific types of images (Figure 3). examine how the resulting latent structure is correlated with characteristic types of data in the training dataset. It can be determined by transforming center positions of the clusters of the landscape D(l) to observable images with the generative transformation G(l) (Section 2.4). Figure 2 shows the resulting “map” of images associated with the identified density structure of the generative landscape. Figure 3: Generative latent landscape, continuity Examination of different clusters and landscapes produced by different individual models allows to conclude that consistency and connectedness is a general property of generative latent landscapes. 3.3. Structural consistency of latent representations While latent representations created by generative models can be expected to be specific to individual learning models due to peculiarities of the training process, for example, random selection of training samples. At the same time, Figure 2: Generative structure of the latent some essential characteristics of generative landscape (vertical axis: slice; horizontal axis: representations appeared to be consistent between cluster (first 15 clusters) the learning models. To investigate consistency of the latent 3.4. Unsupervised concept learning structure, an analysis of latent landscapes produced with three independently trained The results of the preceding sections, with generative models was performed. strong correlations observed between the The models were trained over 40-60 epochs of emergent latent structure of successful generative unsupervised generative learning with a training models and characteristic types of observable data set of 10,000 samples, achieving a training plateau can be interpreted as distillation of “native” or at validation loss of 0.12-0.14 (with the starting “natural” concepts in the observable data in the value of ~ 0.7) and good to excellent generative process of unsupervised learning with performance on a subset of images and were not minimization of generative error. The structure or selected by any specific criteria. After completion the latent landscape, as discussed in the preceding of the training phase several successful sections, can be resolved in an entirely independently trained models were selected and unsupervised process by a number of methods. characteristics of generative landscapes produced It can be concluded from these results that with methods described earlier measured. generative learning under certain constraints and The measured characteristics were: the overall the resulting structure in the informative latent size of the landscape as the number of identified representations can be used as a foundation for density clusters with population above certain implicit learning of characteristic patterns in the margin, relative to the size of the training dataset observable data before and without external (~ 2%); recognition, the fraction of the landscape contextual information about it. These results can clusters associated with recognizable digits (as also offer insights into explainability of learning discussed in Section 3.1), indicating a correlation in generative models via association of learned of the landscape with the characteristic content of concepts or classes in the observable data and the the training set; representativity of the content of native information structure that emerges in the the landscape, such as presence of all types of latent representations in the process of digits (completeness) and distribution of digits unsupervised generative learning. between slices and clusters (digits with highest and lowest population of associated clusters in the landscape). The results are presented in Table 1. 4. Discussion Table 1 Consistency of latent structure Highly structured character of low- Model Size Recog Complet Populati dimensional generative representations produced nition eness on: h / l by successful models of unsupervised generative self-learning observed in this work provides A 474 0.973 True 0,7 / 4,6 further support for a growing number of results B 396 0.975 True 0,7 / 2,5 pointing at importance of informative C 485 0.971 True 0,4 / 2,8 representations in processing of sensory information by learning systems, of both artificial As can be inferred from these results, latent and biological nature. landscapes of independently trained successful In this work the effect was observed with real- generative models had significant consistency in world image data of significant complexity, the size, recognition and representation of pointing at a general character of the effect. characteristic types of images. On the other hand, Informative structured representations strongly factors such as distribution of digits in the slices correlated with characteristic patterns, or concepts and clusters, highest and lowest representation of in the sensory data can play an essential role in digits in the clusters and a number of others emergence and development of intelligent tended to be more specific to individual learning behaviors including conceptual intelligence, models. abstraction and communications. Similar results were previously obtained with Continuing research in this direction can shed several different types of image data such as light on common principles of learning for geometrical shapes [9] pointing at the likelihood artificial and biological systems and perhaps point of a general character of the observed effect of a direction to a generation of learning systems categorization in the latent representations of capable of more natural and intuitive learning successful generative models by characteristic from direct interaction with the sensory types of patterns. environment [20]. 5. References Systems, Vancouver, Canada, 2019 13155– 13165. [12] T. Yoshida, K. Ohki, Natural images are [1] G. Hinton, S. Osindero, Y.W. Teh, A fast reliably represented by sparse and variable learning algorithm for deep belief nets, populations of neurons in visual cortex, Neural Computation 18(7) (2006) 1527– Nature Communications 11 (2020) 872. 1554. [13] X. Bao, E. Gjorgiea, L.K. Shanahan et al., [2] A. Fischer, C., Igel, Training restricted Grid-like neural representations support Boltzmann machines: an introduction, olfactory navigation of a two-dimensional Pattern Recognition 47 (2014) 25–39. odor space, Neuron 102 (5) (2019) 1066– [3] Y. Bengio, Learning deep architectures for 1075. AI, Foundations and Trends in Machine [14] E.H. Rosch, Natural categories, Cognitive Learning 2(1) (2009) 1–127. Psychology 4 (1973) 328–350. [4] A. Coates, H. Lee, A.Y. Ng, An analysis of [15] Q.V. Le, A tutorial on deep learning: single-layer networks in unsupervised autoencoders, convolutional neural networks feature learning, in: Proceedings of 14th and recurrent neural networks, Stanford International Conference on Artificial University 2015. Intelligence and Statistics 15 (2011) 215– [16] Keras: Python deep learning library, last 223. accessed: 2020/11 URL: https://keras.io. [5] M.A Ranzato, Y.-L. Boureau, S. Chopra, Y. [17] Y. Le Cun, The MNIST database of LeCun, A unified energy-based framework handwritten digits, Courant Institute, NYU for unsupervised learning, in: Proceedings of Corinna Cortes, Google Labs, New York 11th International Conference on Artificial Christopher J.C. Burges, Microsoft Research Intelligence and Statistics 2, 2007, 371–379. 2007 Redmond, USA. [6] Q.V. Le, M.A. Ranzato, R. Monga et al., [18] S. Dolgikh, Low-dimensional represent- Building high level features using large scale tations in unsupervised generative models, unsupervised learning, arXiv 1112.6209 in: Proceedings of 20th International (2012). Conference Information Technologies – [7] I. Higgins, L. Matthey, X. Glorot, A. Pal et Applications and Theory (ITAT), Slovakia, al., Early visual concept learning with 2020 239–245. unsupervised deep learning, arXiv [19] K. Fukunaga, L.D. Hostetler, The estimation 1606.05579 (2016). of the gradient of a density function, with [8] N. Seddigh, B. Nandy, D. Bennett, Y. Ren, applications in pattern recognition, IEEE S. Dolgikh et al., A framework & system for Transactions on Information Theory 21 (1) classification of encrypted network traffic (1975) 32–40. using Machine Learning, in: Proceedings of [20] D. Hassabis, D. Kumaran, C. Summerfield, 15th International Conference on Network M. Botvinick, Neuroscience-inspired and Service Management (CNSM), Halifax, Artificial Intelligence, Neuron 95(2) (2017) Canada, 2019, 1–5. 245–258. [9] S. Dolgikh, Topology of conceptual representations in unsupervised generative models, in: Proceedings of 26th International Conference on Information Society and University Studies, Kaunas, Lithuania, 2021, 150–157. [10] J. Shi, J. Xu, Y. Yao, B. Xu, Concept learning through deep reinforcement learning with memory augmented neural networks, Neural Networks 110 (2019) 47– 54. [11] R.C. Rodriguez, S. Alaniz, Z. Akata, Modeling conceptual understanding in image reference games, in: Proceedings of Advances in Neural Information Processing