Object image recognition using multilayer perceptron combined with Singular value decomposition Vasyl Lytvyn1,†, Ivan Peleshchak1,*,†, Roman Peleshchak1,†, Iryna Shakleina1,†, Nazarii Mozol1,*,†, Dmytro Svyshch1,*,† 1Lviv Polytechnic National University, 12 Stepana Bandera Street, Lviv, 79013, Ukraine Abstract In this paper, a method for recognizing object images with high accuracy using a multilayer perceptron combined with Singular value decomposition (SVD) is developed. The neural network was trained by a back- propagation algorithm using the Adam optimizer based on the Mnist dataset. The singular value decomposition of the matrix was used for data preprocessing and initialization of the network layer weights, which made it possible to increase the image recognition accuracy by 2% with less input data, which is equal to 18% of the data compared to the model without SVD (144px/784px=0.18). In addition, the application of the proposed combined method makes it possible to perform effective short-term training of small neural networks on small photos, unlike existing traditional methods based on VGG and ResNet architectures. The proposed combined method is especially valuable for image recognition in the presence of limited computing resources and training time. Keywords multilayer perceptron, singular value decomposition of matrices, object image recognition, initialization of scales, Adam optimizer. 1. Introduction The task of image recognition is a popular one for all kinds of application purposes. The main problem in this task is that standard methods for training larger neural networks are often not effective when resources are limited or when a given accuracy needs to be achieved quickly. Our work focuses on developing a methodology that combines the SVD method, which preprocesses data and initializes weights, with a neural network to optimize the process of training them on small photos. The paper emphasizes that the proposed method maximizes performance in the case of short-term training, which is critical when resources are severely limited. Compared to the known neural network architectures (VGG and ResNet), our approach demonstrates a much smaller model size, which is reflected in the computing resource. Compared to using a neural network model without additional processing with SVD, the accuracy of image recognition is lower. The neural network combined with SVD has high performance, which is confirmed by a computer experiment. The paper discusses in detail the theoretical aspects of SVD and its application in the context of small neural networks and describes the experiments performed and the results obtained. We show how using SVD for layer weights initialization and data preprocessing can significantly improve the efficiency of training in the early stages, giving small networks a significant advantage in the speed and accuracy of image recognition. The final part is devoted to discussing the prospects for further research and the possibility of applying the developed method in various fields. MoMLeT-2024: 6th International Workshop on Modern Machine Learning Technologies, May, 31 - June, 1, 2024, Lviv-Shatsk, Ukraine ∗ Corresponding author. † These authors contributed equally. vasyl.v.lytvyn@lpnu.ua (V. Lytvyn); ivan.r.peleshchak@lpnu.ua (I. Peleshchak); roman.m.peleshchak@lpnu.ua (R. Peleshchak); ioshakleina@gmail.com (Iryna Shakleina); nazarii_mozol@icloud.com (N. Mozol); svyshch.d.m@gmail.com (D. Svyshch) 000-0002-9676-0180 (V. Lytvyn); 0000-0002-7481-8628 (I. Peleshchak); 0000-0002-0536-3252 (R. Peleshchak); 0000- 0003-0809-1480 (I. Shakleina); 0009-0003-6770-7609 (N. Mozol); 0009-0004-7882-9676 (D. Svyshch) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings The aim of the work is to develop a combination of a multilayer perceptron with the SVD algorithm for recognizing object images with high accuracy (>99%) and a small value of the loss function (<0.1). 2. Literature analysis In modern data analysis, data classification is one of the most popular tasks solved, in particular with the help of neural networks and the singular value decomposition method. In the field of pattern recognition, methods that seek to address changes in lighting, pose, and facial expression are key to improving the performance of neural networks. Paper [1] makes a significant contribution to this field by proposing a symmetric singular value decomposition representation (SSVDR) method for image recognition. This method utilizes the singular value decomposition (SVD) and face symmetry to improve the quality of image recognition under varying lighting conditions. The advantage of the proposed method is its ability to create a homogeneous representation of the original image even under large variations in lighting. The simplicity of the method and its high performance with small sample sizes show the possibility of using the method in conditions of limited resources. In [2], the authors discuss the use of singular value decomposition (SVD) as a means to initialize the parameters of neural networks, as well as an analog of multilayer neural networks in linear algebra. The paper discusses the properties of SVD for linear regression and analysis of overdetermined and underdetermined problems. The paper emphasizes the special usefulness of SVD for generating an initial solution in the optimization of nonlinear networks and the high performance of SVD compared to other initialization methods. The authors of [3] investigated the problem of classifying brain images using medical image processing methods. In particular, in [3], the authors propose a method that can automatically detect and classify the degrees of brain gliomas. This method combines mutual information with singular value decomposition and automatically selects a set of features to be used in the classifier. The researchers also point out the limitations of their model, noting that MI-ASVD [3] works well only on a specific selected dataset, and may not be effective on others. Therefore, they recommend using different classifiers and expanding datasets to improve classification accuracy in different clinical cases. They also mention the significant requirements for computing resources. The paper [4] investigates the problem of compression and classification of electrocardiographic (ECG) signals with arrhythmias using the singular value decomposition (SVD) method. It is noted that ECG monitoring systems are widely used in telemedicine applications, where ECG signals are first compressed before transmission and storage. The method of using SVD for decomposition of ECG signals is proposed, after which various classifiers are used, in particular convolutional neural networks (CNN) and support vector machines (SVM). The SVD technique is used to compress and reconstruct the ECG signal, and then the resulting compressed signal is classified and detected using two types of classifiers. The authors of [5] investigate the application of the singular value decomposition (SVD) method to accelerometer data for recognizing falls and activities using one-dimensional convolutional neural networks (CNNs). Three dimensionality reduction methods - SVD, sparse PCA, and kernel PCA - are compared for their effectiveness in extracting useful features for fall and activity recognition. Experiments conducted on three datasets of falls and activities of daily living show that SVD applied to accelerometer data, together with raw data or acceleration magnitude vector, showed better accuracy in recognizing falls and activities using one-dimensional CNN than other dimensionality reduction methods based on principal component analysis. Study [6] addresses the problem of efficient initialization of recommender system (RS) algorithms based on singular value decomposition (SVD). The paper solves the problem of developing an efficient initialization method for SVD algorithms. We propose a general framework for initialization using neural embeddings, where a low-complexity probabilistic autoencoder neural network initializes the characteristics of the user and elements. This framework supports explicit and implicit feedback. The article notes that existing SVD algorithms usually initialize the user/item characteristic vectors with random values and thus can achieve local optimization. When comparing classification approaches using neural networks and the singular value decomposition method, some important characteristics can be identified:  Neural networks are generally known for their ability to adapt to complex dependencies in data and their ability to use deep learning to automatically identify important features or attributes. This can lead to high accuracy in classification, especially in cases where the data has a complex structure or non-linear dependencies.  Neural networks may require a large computational resource to train the model, which is also partially addressed in our work.  Neural networks, especially deep ones, are often known for their "black box" nature, i.e., difficult to interpret solutions. This can be a problem in some areas where a clear explanatory model is needed. The singular value matrix decomposition method can provide more interpretability because it allows you to isolate the main components of the data. This can help to understand which features or functions are most important in the classification process.  Singular value decomposition (SVD) is widely used to reduce the dimensionality of data and extract the principal components. This improves the efficiency and speed of data processing. However, SVD can be less effective in detecting complex relationships between features in some data, which can lead to lower classification accuracy, but this problem is solved in our work by combining SVD and fully-connected neural network. In contrast to the discussed methods, our work focuses on the use of SVD in small fully connected neural networks for classification. We found that using SVD for weight initialization and image preprocessing can improve the accuracy of neural networks with a small number of hyperparameters. Our proposed approach is particularly productive for applications where there are time or computational resource constraints. Our work utilizes SVD to improve performance in small datasets and small images, where neural networks are prone to overfitting. Our research is also aimed at improving the training procedure. Based on computer experiments, we show that the use of SVD in combination with a fully connected neural network has significant advantages for fast training of neural networks with small datasets. 3. Materials and methods 3.1 Definition of SVD Singular matrix decomposition is a powerful mathematical tool that is widely used in data processing and analysis, including machine learning. This method allows you to decompose an input matrix into three other matrices U, S, and V.T, where U and V.T are orthogonal, and S is a diagonal matrix of singular values. Figure 1: Scheme of the input matrix decomposition Figure 2: Schematic of the detailed obtaining of each matrix This decomposition helps to identify the main components of the data matrix, reduce the dimensionality, and initialize the weights in neural networks. An example of the decomposition is shown in Figure 1 and Figure 2. 3.1.1 Using SVD for data pre-processing In our study, the SVD method is used to optimize data processing. First, the input data is reduced in dimensions for further feeding it to the decomposition function. At the time of SVD, we extract the most important features from the image. By selecting only the first 144 singular vectors, we significantly reduce the dimensionality while preserving important information about the image. 3.1.2. Using SVD to initialize the scales According to the general idea of the work, it would be inefficient to use all the input data to initialize the weights, since they have already passed the decomposition pre-processing step, so we will focus on which of the decomposition matrices we can use and for what purpose: 1. U-matrix - the left singular vectors of the input matrix that form an orthonormalized basis for the column space. It can be useful for capturing dominant patterns in data, for example, in autoencoder architectures where we are trying to capture variance or data structure in a reduced dimensional space. 2. S-matrix - a diagonal matrix of singular values that show the amount of contribution of each corresponding singular vector to the matrix. Since these values quantify the importance of each singular vector in capturing the variance of the data, initializing the weights with singular values would not be entirely correct and cannot be applied directly, as they do not provide a basis, but rather scale the contribution of each basis vector. 3. V.T-matrix - the right singular vectors of the input matrix form an orthonormalized basis for the row space. It can be particularly effective for layer initialization when you want to project the input data into a space that emphasizes the most significant patterns in the training set. Figure 3: Scheme for obtaining the global minimum of the loss function Given this data, we use V.T. to extract the most significant patterns from the training data. This will allow us to be closer to the global minimum of the loss function in our case at the beginning of training, which, ceteris paribus, will allow us to obtain a higher accuracy result (Figure 3). 3.2 Practical application of SVD 3.2.1 The problem of retraining and the assumption of dimensionality reduction For the vast majority of small networks, when they are trained for classification, retraining on training data is relevant, which does not allow them to reliably predict test data, since they have a limited number of parameters to generalize the data set, identify the necessary patterns that are present, or overcome the presence of noise. One way to overcome this is to use convolutional neural networks, but our goal is small networks, so this method is not relevant for two reasons: 1. Model size - in both cases, when we learn our feature layers or use ready-made ones (e.g. ResNet architecture or VGG16), we significantly increase the size of the model by memorizing additional parameters, for example, the size of these layers for ResNet starts at 44 mb [7]. 2. Training complexity – in the case of using convolutional feature layers (CNNs), we have to train them ourselves, which is time-consuming, and in the case of using ready-made ones, it still leads to a higher load when predicting data, simply because of the larger number of parameters we use for calculation. Figure 4: Scheme of taking into account only the most important K features An alternative to this would be to reduce the dimensionality of the data. But in this case, we will be discarding random pixels, which will lead to significant differences between models under the same conditions. However, it is still feasible to do this with SVD, since we can keep only the components associated with the largest singular values, reducing the feature space to a given number of dimensions (Figure 4). This advantage of the method is the most significant in our work, as it helps to reduce the size of the input data without significant loss of information about it. 4. Preliminary processing and additional information For this work, we used the Python programming language with the Tensorflow framework and the Mnist dataset. In data preprocessing, augmentation is also used to better generalize the features of the input images (Figure 5). Figure 5: Example of an image before and after augmentation It is worth noting that different application interpretations of the SVD method are used for weight initialization and image preprocessing. For images, the TruncatedSVD function is used, which differs from the usual method in that it is approximate and adds some noise to the result, which simulates the operation of weak augmentation of the input data. For the weights, an accurate representation of the SVD method is used, since any small change can move the initial training point far away from the global minimum. 5. Mathematical model of classification Since we use SVD and select only the first N most important features of the image matrix, we take 12*12=144 pixels of the input image (28*28=784) as N, which is only 18.4% of the input information. The main task of the model in this paper is multiclass classification. The task of classification is to find the mapping operator Y*: X → Y for any objects that are not included in the training set with a minimum norm in the Euclidean space: 𝑚𝑖𝑛ԡ𝑦 ∗ −𝑦ԡ (1) where y is the target classifier; y* - neural network classifier. The mapping operator is known only on objects of a finite training set: Xm = {(x1, y1), ..., (xm, ym)}, where Xm is the set of training set elements of dimension m. The task is to build an algorithm that can determine whether an arbitrary object x Î X belongs to the class y Î Y. 5.1 Neural network morphology using SVD We consider only small feed-forward neural networks, so we chose 1 hidden layer of size 128 with the activation function relu; a flattening layer at the input, and a layer of size 10 with the activation function softmax at the output. The flattening layer has the dimension of the product of all the dimensions of the input photo, and since we get the first N most important features, the dimension is 144. Adam optimizer, categorical cross-entropy loss function, accuracy metric (Figure 6). Figure 6: Combined architecture of a neural network with one hidden layer and SVD block The output of this network will be described by the ratio: 128 144 ¯ 𝑦𝑖 = 𝑓𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ቆ෍ 𝜔𝑖𝑗 𝑓𝑟𝑒𝑙𝑢 (෍ 𝜔𝑗𝑘 𝑥𝑘 )ቇ 𝑤ℎ𝑒𝑟𝑒 𝑖 ∈ {1; 10}, (2) 𝑗=1 𝑘=1 where fsoftmax – is the Softmax activation function, 𝑓𝑟𝑒𝑙𝑢 is the Relu activation function, wij is an element of the weight matrix between the first and second tacked layers, wjk is an element of the weight matrix between the input layer and the first hidden layer, xk is an element of the input image vector. 5.2 Neural network morphology without SVD The structure is completely similar to the first network, except for the dimensionality of the input image and the flattening layer - since we do not discard features here, the entire photo (28*28) is included in the network, so the dimensionality of the first dimension of the photo and the dimensionality of the flattening layer is 784 (Figure 7). Figure 7: Architecture of a neural network with one hidden layer 128 784 ¯ 𝑦𝑖 = 𝑓𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ቆ෍ 𝜔𝑖𝑗 𝑓𝑟𝑒𝑙𝑢 (෍ 𝜔𝑗𝑘 𝑥𝑘 )ቇ 𝑤ℎ𝑒𝑟𝑒 𝑖 ∈ {1; 10}, (3) 𝑗=1 𝑘=1 where fsoftmax – is the Softmax activation function, 𝑓𝑟𝑒𝑙𝑢 is the Relu activation function, wij is an element of the weight matrix between the first and second tacked layers, wjk is an element of the weight matrix between the input layer and the first hidden layer, xk is an element of the input image vector. 6. Experimental results and their analysis In this work, we compared the performance of a small fully connected neural network using SVD for image processing and initialization of weights with and without a model. The data for training and testing were obtained from the MNIST dataset, which contains handwritten digits because it contains many small images (28*28 pixels). An important aspect of the experiment is the analysis of the model's loss function. The results show that when SVD is used for weight initialization and image processing, the loss function quickly stabilizes and reaches its horizontal asymptote, which indicates that it has reached a global minimum, already in the first training epoch (Figure 8). This emphasizes the high efficiency of using SVD in the context of quickly achieving the optimal solution. Figure 8: Dependence of image classification accuracy and model loss function on the number of epochs In contrast, in the non-SVD model, we observe that the loss function decreases gradually over the initial few epochs and continues to show a decreasing trend throughout the training process. Furthermore, even at the end of the training period, for the training sample, the loss rate remains higher than in the SVD model. Concerning the test set, we see a slight increase in losses, but this can be explained by the generally less-than-perfect approximation of the results in our dataset, so we can choose the training set as the key parameter for comparison since the difference in accuracy between the two sets is not significant. This indicates that the model without SVD does not achieve as good a solution as the model with SVD. The data analysis confirms that the use of SVD for image preprocessing and initialization of weights helps to improve the classification accuracy of small fully connected neural networks on a small number of epochs. The significant reduction of losses in the early stages of training using SVD also emphasizes its potential to accelerate the learning process, making it ideal for scenarios where time is a limited resource. The results shown in the graphs suggest that the use of SVD can be an important tool for improving the efficiency of small neural networks and can be recommended for a wide range of machine-learning applications. 7. Conclusions A method for recognizing object images using a multilayer perceptron combined with singular value decomposition (SVD) has been developed. The method allows for efficient image processing in conditions of limited computing resources. It was found that, as a result, using SVD for initialization of weights and image processing helps to improve the network performance. The use of SVD helps to reach the global minimum faster, which allows for faster achievement of the optimal result, and makes the learning schedule smoother, which simulates a slower learning rate without directly changing it. The use of SVD helps to reach the global minimum faster, which allows to achieve the optimal result faster. It is proved that the use of SVD improves the accuracy of object image recognition for a small number of training epochs. Experimental results demonstrate that the use of SVD helps to reduce losses in the early stages of training. The programming language used is Python with the Tensorflow framework and the Mnist dataset. The data preprocessing also uses augmentation to better generalize the features of the input images. Adam optimizer, categorical cross-entropy loss function, accuracy metric. The number of hidden layers is 1 with a size of 128 with the relu activation function; the input is a flatten layer, and the output is a layer of size 10 with the softmax activation function. References [1] Yuhui Chen, Shuiguang Tong, Feiyun Cong, Jian Xu. “Symmetrical singular value decomposition representation for pattern recognition”. Neurocomputing. Volume 214, (2016), P. 143-154. DOI: 10.1016/j.neucom.2016.05.075 [2] Bermeitinger, B., Hrycej, T., Handschuh, S. “Singular Value Decomposition and Neural Networks”. ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science, vol 11728. Springer, Cham. (2019). https://doi.org/10.1007/978-3-030-30484-3_13 [3] Z. A. Al-Saffar and T. Yildirim. “A Novel Approach to Improving Brain Image Classification Using Mutual Information-Accelerated Singular Value Decomposition”. In IEEE Access, vol. 8, (2020), pp. 52575- 52587. doi: 10.1109/ACCESS.2020.2980728 [4] Lijuan Zheng, Zihan Wang, Junqiang Liang, Shifan Luo, Senping Tian. “Effective compression and classification of ECG arrhythmia by singular value decomposition”. Biomedical Engineering Advances. Volume 2, (2021), 100013. https://doi.org/10.1016/j.bea.2021.100013 [5] H. Cho, S.M. Yoon. “Applying singular value decomposition on accelerometer data for 1D convolutional neural network based fall detection”. Electronic letters. Volume 55, Issue 6, (2019), P. 320-322. https://doi.org/10.1049/el.2018.6117 [6] T. Huang, R. Zhao, L. Bi, D. Zhang and C. Lu. “Neural Embedding Singular Value Decomposition for Collaborative Filtering”. In IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 10, (2022), pp. 6021-6029. doi: 10.1109/TNNLS.2021.3070853 [7] Model VGG 16. https://pytorch.org/vision/main/models/generated/torchvision.models.vgg16.html . Access date: 08.04.2024