On Lindenmayer Systems and Autoencoders Andrej Lucny Comenius University, Bratislava 84248, Slovakia, lucny@fmph.uniba.sk, WWW home page: http://dai.fmph.uniba.sk/w/Andrej_Lucny/en Abstract: Lindenmayer Systems can serve to deep learn- ing not only as a generator of simulated datasets. They Table 1: Production rules for rose-leaf images, borrowed could provide datasets of images generated from a very from [10], page 126. few parameters that enable us a better study of the latent ω0 : [{A(0, 0).}][{A(0, 1).}] space, which is crucial for the majority of deep neural net- p1 : A(t, d) : d = 0 → .G(LA, RA). works. We process a dataset, generated by a parametric [+B(t)G(LC, RC,t).}][+B(t){.]A(t + 1, d) Lindenmayer system, with the convolutional autoencoder. p2 : A(t, d) : d = 1 → .G(LA, RA). We aim to recognize the values of the Lindenmayer system [−B(t)G(LC, RC,t).}][−B(t){.]A(t + 1, d) parameters by its encoder part. Finally, we partially turn a p3 : B(t) : t > 0 → G(LB, RB)B(t − 1) generator based on its decoder part to a neural network that p4 : G(s, r) → G(s ∗ r, r) generates images from the dataset upon the Lindenmayer p5 : G(s, r,t) : t > 1 → G(s ∗ r, r,t − 1) system parameters. d = 0 means the left side and d = 1 the right side of the leaf. t is timing. G(length, growth_rate) corresponds to 1 Introduction venations. + and − represent rotation. [ and ] define a tree structure of the generated string. Dots represent points on Deep neural networks for the vision we typically train the leaf that are structured to polygons by {}. LA, LB, LC from datasets of annotated images. Their preparation is are parameters for the initial length of the main segment, a manual job that sometimes we can avoid if we use the the lateral segment, and the marginal notch. RA, RB, RC so-called simulated dataset. There are several grammar- represent their growth rate. + and − have also parameter based systems that we can use for the simulation [5]. One - the angle between stem and venations. The last param- of them is Lindenmayer systems [8], used already for this eter is the direction of the stem, that we select when we purpose [14]. However, this approach issues many other interpret the string and turn it into an image. questions that we would like to deal with in this paper. Can we find the parameters of the Lindenmayer system somewhere inside the neural network that is processing a Table 2: Strings that represent rose leaves, generated by dataset produced by the Lindenmayer system? Can we the Lindenmayer system. From top to bottom: axiom, the create a neural network that generates the same images as first, the second, and the third iteration. the Lindenmayer system? And could it make them from the parameters of the Lindenmayer system? [{A(0, 0).}][{A(0, 1).}] [{.G(5, 1.15).[+B(0)G(3, 1.19, 0).}][+B(0){.]A(1, 0).}] 1.1 Parametric Lindenmayer Systems [{.G(5, 1.15).[−B(0)G(3, 1.19, 0).}][−B(0){.]A(1, 1).}] We employ the parametric Lindenmayer system proposed [{.G(5.75, 1.15).[+B(0)G(3, 1.19, 0).}][+B(0){.]. in [10]. It generates rose leaves upon eight parameters, G(5, 1.15).[+B(1)G(3, 1.19, 1).}][+B(1){.]A(2, 0).}] from which just two can significantly vary: the angle of [{.G(5.75, 1.15).[−B(0)G(3, 1.19, 0).}][−B(0){.]. the stem of the rose leaf and the angle between the left G(5, 1.15).[−B(1)G(3, 1.19, 1).}][−B(1){.]A(2, 1).}] and right venations and the stem. It has a set of produc- tion rules which are applied on an initial axiom iteratively [{.G(6.6125, 1.15).[+B(0)G(3, 1.19, 0).}][+B(0){.]. and per iteration simultaneously (Table 1). As a result, the G(5.75, 1.15).[+G(1.3, 1.25)B(0)G(3, 1.19, 1).}] system generates strings in Table 2. [+G(1.3, 1.25)B(0){.].G(5, 1.15).[+B(2) We turn the generated strings into images in two steps. G(3, 1.19, 2).}][+B(2){.]A(3, 0).}][{. At first, we use turtle graphics following symbols G, + and G(6.6125, 1.15).[−B(0)G(3, 1.19, 0).}][−B(0){.]. − (go, rotate left, rotate right) structured by [ ]. In this way, G(5.75, 1.15).[−G(1.3, 1.25)B(0)G(3, 1.19, 1).}] [−G(1.3, 1.25)B(0){.].G(5, 1.15).[−B(2)G(3, 1.19, 2) Copyright c 2020 for this paper by its authors. Use permitted un- .}][−B(2){.]A(3, 1).}] der Creative Commons License Attribution 4.0 International (CC BY 4.0). Figure 3: Any image in the dataset can be expressed as a sum of the mean and multiples of the eigenvectors. Figure 1: Rose-leaf images generated by the Lindenmayer system. Figure 4: Autoencoder. 1.2 Deep learning and Autoencoders Deep learning [4] is young but well-known and very suc- cessful part of machine learning based on artificial neural networks with a specific architectural design. They en- hance the classic neural networks like the perceptron [12] that are theoretically strong, but processing larger inputs as images is not a tractable task form them in practice. Deep neural networks typically employ gradual decreas- Figure 2: Eigenvalues confirms the fact that the generated ing of data dimension and turn the input image to a feature dataset has a few amount of parameters. vector which has a small enough dimension to be further processed classically. This approach is reflected in their architecture by a deep sequence of convolutional layers (which usually implement 3x3 or 5x5 kernel-based opera- we calculate the exact positions of all dot symbols that rep- tors) interlaced with the MaxPooling layers (which are re- resent points. In the second step, we structure these points sponsible for the dimension reduction since they replace e. into polygons following symbols { } and draw the poly- g. 2x2 values by their maximum) followed by a few fully gons. As a result, we get an image containing a rose leaf connected layers corresponding to the classic perceptron. (Figure 1). Varying the parameters of the Lindenmayer The features do not need to be designed manually but they system, we can generate the whole dataset of such images. are found automatically in the process of end-to-end train- ing [13] that corresponds to minimalization of a suitable Having any dataset, we can get imagination about loss function. The feature vector can be concerned as a the number of parameters, that generate it, via Principal point in the so-called latent space. We wish that similar Component Analysis (PCA) [9]. We concern that two- images are mapped to close points and different images dimensional images are just one-dimensional vectors, i.e. to far points in that space. We also want that the feature we put their pixels row by row to one line. Thus we turn vector would contain as much information about the cor- the dataset of 28x28 images to a set of 784-dimensional responding image as possible. The trick on how to push vectors. Then we can calculate their covariation matrix the neural network to learn such feature extraction is the and find its eigenvectors. Following the corresponding core of the whole deep learning. It can be demonstrated eigenvalues, we can find that much less than 784 eigen- on a neural network called autoencoder (Figure 4). vectors is significant. In our case, it is enough to concern Autoencoder not just reduces the dimension of the input from 8 to 16 eigenvectors (Figure 2). Now, we can express data into the feature vector but then performs an opposite each image from the dataset as a sum of the mean and mul- process and expands the data to their original size, using tiples of the eigenvectors (Figure 3). We can also make a UpSampling layers (which replace each value with its e.g. generator that turns manually selected values of the eigen- 2x2 copies). Then we train it to provide the same output vector multipliers to images, but its quality regarding the on a given input. If we succeed, then we are sure that generation of rose leaves is low. the feature vector represents the input image well, because been varying mainly the stem angle and the angle be- tween stem and venations. Other parameters can vary just slightly; the resulted image is far from a rose leaf other- wise. We have also turned the output images to binary form and resized them to 28x28. That enables us to use a proven autoencoder architecture, which requires this input size. We have decided that the stem always starts in the top left corner. This decision enables us to process the dataset also with straightforward methods like eigenimages and compare their results with the autoencoder. All together our dataset had 1498 images. A few samples can be seen in Figure 5. Of course, we have recorded also the parame- ters which we have used for the generation of each image. In this way, we have created an annotated dataset free of charge. 3 Autoencoder training Figure 5: A few samples from the generated dataset. The Involving deep learning, we start with the training of the images are annotated by parameters of the Lindenmayer autoencoder. Thus, so far, we will not work with the im- system used for their generation. age annotation. We utilize a proven architecture of autoen- coder from [3] [11]. On the input, the neural network re- ceives grayscale images (pixels in range 0.0-1.0). They are it is possible to generate the image from the feature vec- processed by a block of sixteen convolutional layers with tor. After such training, we can cut the autoencoder into kernels 3x3, then the dimension is reduced by MaxPool two parts: encoder and decoder. The encoder turns images layers. The output is processed by the next eight convo- to feature vectors and can be combined with a perceptron lutional layers and again reduced. And this repeats until to provide classification or detection tasks. The decoder the input data shape 28x28x1 is turned trough 28x28x16, turns feature vectors to images and can be used as a gen- 14x14x8, 7x7x8, 4x4x8 to which is the feature vector. erator of images, even such images which have been never Then like in the mirror, we expand the data by convolu- presented to the network. tional and UpSampling layers to the original size 28x28x1 Of course, typically it is difficult to understand repre- (Figure 6). sentation in the latent space when we are working with For non-linearity, the convolutional layers use the ReLU real images that have many parameters. Will it be more activation function, besides two places. There is sigmoid simple if we present to the autoencoder dataset precisely used just before the latent space to ensure that values in generated from a very concrete and small number of pa- the latent space are from interval <0.0,1.0>. And there rameters (what Lindenmayer systems can do for us)? The is sigmoid on the output from the network; not only to organization of the latent space is crucial as it is shown by enable us to interpret the output as an image with pixels in its advanced versions like the variational autoencoder [6]. range 0.0-1.0 but also to enable us to use the binary cross- Therefore we would like to play with this idea a bit. In the entropy loss function, which has a better performance than next chapters, we prepare a suitable dataset (chapter 2), the classic MSE. we train an autoencoder and compare the set of the images We train the autoencoder with Keras 2.3.1 [3] using with the set of their feature vectors (chapter 3), try to rec- Tensorflow 2.1.0 as a backend. We use the Adadelta batch ognize parameters of the Lindenmayer system by the en- gradient descent algorithm. After 200 epochs, the accu- coder (chapter 4) and turn decoder from a generator based racy is 98.38% on the training set and 98.60% on the test- on feature vectors to a generator based on the parameters ing set (10% of samples) (Figure 7). The achieved quality of the Lindenmayer system (chapter 5). is good (Figure 8). Now, the autoencoder can code each image to a vector of 128 floats 0.0-1.0 and decode the vec- tor to a very similar image (Figure 9) and we can continue 2 Dataset preparation with its splitting into two parts: encoder and decoder. While we can employ the encoder part for generating We have employed the Lindenmayer system defined in Ta- another dataset that contains the feature vectors, the de- ble 1 for generating our dataset of rose-leaf images. We coder part can be used as a generator of rose-leaf images. have implemented the Lindenmayer system in Python 3.6 It is not a very handy generator since we have to set up using OpenCV 4.3.0 [1]. Concerning simplicity, we have properly 128 values 0.0-1.0, but it is possible to gener- Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 28, 28, 1) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 28, 28, 16) 160 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 14, 14, 8) 1160 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 7, 7, 8) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 7, 7, 8) 584 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 4, 4, 8) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 128) 0 _________________________________________________________________ activation_1 (Activation) (None, 128) 0 _________________________________________________________________ reshape_1 (Reshape) (None, 4, 4, 8) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 4, 4, 8) 584 _________________________________________________________________ up_sampling2d_1 (UpSampling2 (None, 8, 8, 8) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 8, 8, 8) 584 _________________________________________________________________ up_sampling2d_2 (UpSampling2 (None, 16, 16, 8) 0 _________________________________________________________________ conv2d_6 (Conv2D) (None, 14, 14, 16) 1168 Figure 8: Sample input images from our dataset in the top _________________________________________________________________ line and the corresponding output images calculated by our up_sampling2d_3 (UpSampling2 (None, 28, 28, 16) 0 _________________________________________________________________ autoencoder. conv2d_7 (Conv2D) (None, 28, 28, 1) 145 ================================================================= Total params: 4,385 Figure 6: The architecture of the used autoencoder in de- tails [11]. [0.98625815, 0.66299981, 0.99246186, 0.57722825, 0.5 , 0.90062493, 0.93622261, 0.5 , 0.5 , 0.95261234, 0.99352407, 0.5 , 0.5 , 0.98512369, 0.5 , 0.82000422, 0.54893059, 0.98727709, 0.71470815, 0.5 , 0.5 , 0.94741422, 0.5 , 0.95114863, 0.75802684, 0.9719044 , 0.5 , 0.5 , 0.5 , 0.82802659, 0.51641005, 0.97730321, 0.9978047 , 0.80023605, 0.99893409, 0.6468786 , 0.5 , 0.94131184, 0.99782324, 0.88698453, 0.5 , 0.5 , 0.99995816, 0.50391984, 0.5 , 0.5 , 0.50378138, 0.5 , 0.5 , 0.78034681, 0.99993455, 0.53947443, 0.5 , 0.5 , 0.5 , 0.5 , 0.5 , 0.98526198, 0.91212052, 0.5 , 0.5 , 0.87080342, 0.5 , 0.66338164, 0.99515474, 0.83904874, 0.99057883, 0.5 , 0.5 , 0.95147383, 0.99873096, 0.96877152, 0.77010125, 0.5 , 0.99994564, 0.96718907, 0.5 , 0.5 , 0.99889612, 0.5 , 0.5 , 0.5 , 0.99997294, 0.88351119, 0.5 , 0.5 , 0.86901689, 0.5 , 0.5 , 0.5 , 0.9869802 , 0.80338162, 0.5 , 0.5 , 0.5 , 0.5 , 0.91442615, 0.8813709 , 0.5 , 0.5 , 0.5 , 0.81748968, 0.93058556, 0.98220086, 0.87647492, 0.5 , 0.97868299, 0.95324636, 0.5 , 0.77138752, Figure 7: Training of the autoencoder. 0.99827719, 0.88651741, 0.5 , 0.5 , 0.99581003, 0.99394023, 0.5 , 0.5 , 0.99652219, 0.5 , 0.5 , 0.5 , 0.90829074, 0.9915418 , 0.5 , 0.5 , 0.5 , 0.5 ] ate someting like a leaf just from the feature vector values (Figure 10, on the left). Can we make such a generator handier? Yes, we can - in a similar way, how we created the generator based on eigenimages. We perform PCA on the dataset of the Figure 9: An example of an input image from the dataset, feature vectors, and we set up just the main components. its feature vector in the latent space of the autoencoder, We express the feature vector as a sum of mean and mul- and the corresponding output image. tiples of eigenvectors. Then we need to set up manually just the multipliers of a few significant eigenvectors. We Figure 10: On the left: An image generated from the man- ually selected 128 values of the feature vector. Quality Figure 12: The architecture of the recognizer of the Lin- is quite poor. On the right: An image generated from 8 denmayer system parameters. most significant multipliers of the latent space eigenvec- tors. Quality is better. tron to map the feature vectors to the Lindenmayer system parameters. Though the approach was operational, later we found that it was over-engineered. The linear regres- sion can provide here results as good as the perceptron. In both cases, we can sufficiently recognize the stem an- gle: 97% by regression and 99% by the perceptron. On the other hand, the angle between stem and venations we have failed to recognize. It is perhaps due to small resolu- tion and binary form of images that is a limitation coming from the used architecture and our hardware. Linear regression can be added to the encoder neural network as one fully connected layer without bias and with the linear activation function. In this way, we have con- structed a neural network that gets an image generated by the Lindenmayer system and recognizes the values of the Lindenmayer system parameters (Figure 12). Figure 11: Eigenvalues of the latent space enlighten that encoder does not reduce the number of parameters. (Com- 5 Neural network generating images from pare to Figure 2) the Lindenmayer system parameters have used only eight parameters from which we calculate Though recognition of the Lindenmayer system parame- the 128 items of the feature vector and that we put into the ters from the feature vector is straightforward, the inverse decoder to obtain the corresponding image. This generator operation is not. It is even clear without a trial. However, is handier, and it provides pretty rose leaves (Figure 10 on we can still train a perceptron that approximates the in- the right), though not only them. verse relation. We put all the eight parameters from the annotation of our dataset (two of which significantly vary and six almost constant) to the perceptron input and expect 4 Recognition of the Lindenmayer system the corresponding feature vector (128 values) calculated parameters by the encoder from the dataset image. Then we search for a suitable number of hidden layers and suitable num- Though we can generate rose leaves from a few parameters bers of neurons in those layers. We have trained each such now, it is hopeless to look for the parameters of the Lin- candidate architecture. We have followed namely valida- denmayer system among them. Neither parameter of the tion loss since the accuracy was very low (up to 40%). For- latent space nor multiplier of its eigenvectors directly cor- tunately, this does not mean that the trained network does responds to a parameter of the Lindenmayer system. Even not work, because some items of the feature vector are less when we perform the PCA over the set of the feature vec- important than others, and the error on them can be high tors calculated by the encoder from images in the dataset, without a bad impact. Finally, we have used a perceptron we find that it has the same distribution of the main com- with two hidden layers with the hyperbolic tangent acti- ponents (Figure 11). vation function, each containing 256 neurons. And when However, we can easily reveal that they are not so far we joined the perceptron and the decoder, we have got a from them. In the beginning, we aimed to train a percep- neural network (Figure 13) that can generate images from All codes developed during the preparation of this paper are available at GitHub: https://github.com/andylucny/On- Lindenmayer-Systems-and-Autoencoders.git Acknowledgement This research was supported by the project VEGA 1/0796/18. Figure 13: The architecture of the generator of images References from the Lindenmayer system parameters. [1] Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools, (2000) [2] Brownlee, J.: Deep Learning for Computer Vision. edition v 1.4, machinelearningmastery.com, 2019 [3] Chollet, F.: Deep Learning with Python. Manning Publica- tions Co., Greenwich, CT, USA, 2017 [4] Goodfellow, I., Bengio, Y., Courville, A. Deep Learning. MIT Press, 2016 [5] Kelemen, J., Kelemenova, A., Mitrana, V.: Towards Biolin- guistics. Grammars 4 (2001), pp. 187–292 [6] Kingma, D., Welling, M.: An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learn- ing: Vol. 12 (2019): No. 4, pp 307-392 [7] Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet Clas- sification with Deep Convolutional Neural Networks. Ad- vances in neural information processing systems 25(2), (2012) Figure 14: Generating images from the Lindenmayer sys- [8] Lindenmayer, A.: Mathematical models for cellular inter- tem parameters. action in development. J. Theoret. Biology 18, (1968), pp. 280–315 [9] Pearson, K.: On Lines and Planes of Closest Fit to Systems the parameters of the Lindenmayer system, namely from of Points in Space. Philosophical Magazine. 2 (11), (1901), the stem angle (Figure 14). pp. 559–572. [10] Prusinkiewicz, P., Lindenmayer, A., Hanan, J.: Algorith- mic beauty of plants. NewYork: Springer-Verlag, 1990. 6 Conclusion [11] Rosebrock, A.: Deep Learning for Computer Vision with Python. ImageNet Bundle. 2nd edition. PyImageSearch, In this paper, we have dealt with the potential of Linden- 2018 mayer systems to pose attractive questions related to deep [12] Rosenblatt, F.: The Perceptron: A Probabilistic Model for learning. We have prepared a dataset generated by the Information Storage and Organization in the Brain, Cornell Lindenmayer system. Thus we have got its annotation in Aeronautical Laboratory, Psychological Review, v65, No. the form of the Lindenmayer system parameters free of 6, (1958), pp. 386–408 charge. Then we used the dataset for the training of the [13] Rumelhart, D., Hinton G., Williams, R.: Learning internal convolutional autoencoder. Further, we have investigated representations by error propagation. Parallel Distributed the relationship between its latent space (feature vectors) Processing. Vol 1: Foundations. MIT Press, Cambridge, MA, 1986 and the Lindenmayer system parameters. We found that at least some parameters of the Lindenmayer system we can [14] Ubbens, J., Cieslak, M., Prusinkiewicz, P., Stavness, I.: The use of plant models in deep learning: an application to leaf easily recognize from feature vectors. Finally, we have counting in rosette plants. Plant Methods 14(1), (2018). tried to create a neural-network-based generator analog- ical to the Lindenmayer system, i.e. a neural network that generates the same images as the Lindenmayer sys- tem from the Lindenmayer system parameters. This last job was successful just partially. Our future work should concentrate on the hyper-parameters of the autoencoder ar- chitecture. We need an operational architecture that has a larger input image and the latent space as small as possi- ble, containing just parameters that directly correspond to the Lindenmayer system.