Handwritten Ukrainian Character Recognition using a Convolutional Neural Networks and Synthetic Dataset Yevhen Chychkarov, Olha Zinchenko State University of Telecommunications, Solomenska street, 7, Kiyv, 03110, Ukraine Abstract Abstract text. This paper considers several options for the architecture of convolutional neural networks for the recognition of isolated handwritten Ukrainian characters and numbers, which were trained using a synthetic dataset built on the basis of a set of handwritten and cursive fonts. Comparison of the results of recognition of several variants of images containing handwritten letters and numbers using neural networks with different architectures showed that an increase in the number of convolutional layers leads to a decrease in the frequency of erroneous character recognition. The size of the training dataset significantly affects the reliability of character recognition. The data sets used in the work contained from 192 to 2304 samples per class. The upper limit of the number of samples per class is close to the limit that provides acceptable recognition accuracy. Reducing the sample size by reducing the number of samples per class leads to a significant decrease in recognition accuracy (from 90% recognition accuracy of elements of real inscriptions to 40-60% with a 4-fold decrease in sample size). Keywords 1 handwriting recognition; recognition of Ukrainian characters; convolutional neural networks (CNN); digit recognition; deep learning; image processing 1. Introduction Optical character recognition (OCR) is a technology that is widely used today. The basis of this technology is the process of classifying images of symbols, which are selected on the original digital image, according to the corresponding samples [1]. Information technologies based on optical recognition allow solving a wide range of various practical tasks: identification of vehicle registration numbers from images of license plates, which helps control traffic [2], conversion of printed academic records into text for storage in an electronic database, decoding ancient inscriptions and texts, automatic data entry by optical scanning of cards or bank checks. In most cases, modern optical recognition systems are based on deep learning neural networks [3, 4]. Convolutional neural networks (CNN) are widely used for image processing. It is one of the most popular types of deep neural networks and can be used to effectively recognize characters present in an image [5]. 2. Literature review Convolutional neural networks are widely used to solve optical recognition problems. They are able to automatically highlight the conditional features of the input data. The properties of such networks make them a very convenient tool for solving computer vision problems, in particular, for recognizing images of letters or numbers. MoMLeT+DS 2023: 5th International Workshop on Modern Machine Learning Technologies and Data Science, June 3, 2023, Lviv, Ukraine EMAIL: chychksrovea@gmail.com (Y.Chychkarov); zinchenkoov@gmail.com (O.Zinchenko) ORCID: 0000-0002-4362-5129 (Y.Chychkarov); 0000-0002-3973-7814 (O.Zinchenko); ©️ 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Proceedings Initially, most of the research was focused on the recognition of the Latin alphabet letters, but in recent years other alphabets began to attract attention - Arabic, Russian, Kazakh, Chinese, Indian, etc. [6-11]. For research on handwritten Latin alphabet recognition technologies, the EMNIST dataset became the de facto standard [12]. Many different variants of neural network architectures have been proposed to classify the images of this set. One of the first successful attempts to use deep learning for character recognition was the creation of the LeNet-5 architecture [13]. This architecture showed the highest accuracy of classification of handwritten digits among the solutions available at that time (1998). Similar solutions are widely used in relatively recent works on computers with low computing power. For example, the ConvNet architecture proposed in [14] consists of two convolutional layers with kernels of size 5x5 each, and using non-linearity activation (ReLU) functions, a MaxPooling layer after each convolutional layer, and two fully connected layers that contain 500 neurons and the last layer with a selection of 26 classes. Such a neural network has only 60,000 learning parameters. This number of parameters is much smaller than the AlexNet network (60 million training parameters and 650,000 neurons) [15] or the GoogleNet network (6.8 million training parameters) [16]. The best results for training handwritten digit recognition models using the EMNIST Digits (or MNIST) datasets were achieved using convolutional neural networks (see [17] for a review). One way to improve the accuracy of letter or number image recognition is to use models with a more complex architecture than AlexNet or LeNet. For example, good results in recognition accuracy were achieved due to the use of capsular layers [18]. The authors of [19] proposed a convolutional neural network that contains 14 convolutional layers to represent character characteristics, two MaxPooling layers to reduce the number of features or highlight strong features, one softmax layer, and one classification layer for isolated character recognition. Pre-training using ImageNet accelerates convergence, especially at the beginning of training. However, for models with random initialization, the results achieved do not differ from models with pretraining for a comparable number of epochs [20]. According to [21], models created from scratch, as a rule, give better results compared to pre-trained models in the recognition of handwritten characters of the Arabic language. Regarding the complexity of the CNN architectures used, according to [21], less complex CNN models are less accurate, but have higher classification and learning rates (and vice versa). The authors of [21], based on the obtained results, suggested that learning from scratch all used models can improve the accuracy of model classification and the speed of obtaining results, regardless of the complexity of the model. In numerous studies devoted to the recognition of handwritten symbols, there is experience in using sufficiently complex architectures of neural networks. For example, in [22], modern pre-trained CNN architectures were used to classify 231 different Bangla handwritten characters based on the CMATERdb dataset. The images were first converted to black and white form with a white foreground color. Images were resized to 28×28 pixels. These images were used as input for the CNN architectures that were used. The learning rate was set to 0.001. Categorical cross-entropy was used as the error function. After 50 epochs, InceptionResNetV2 achieved the best accuracy (96.99%). DenseNet121 and InceptionNetV3 also demonstrated excellent recognition accuracy (96.55 and 96.20%, respectively). The authors [22] also considered a combination of pre-trained architectures InceptionResNetV2, InceptionNetV3 and DenseNet121, which provided even better recognition accuracy (97.69%) compared to other individual CNN architectures, but concluded that its practical use requires large computing power and memory and therefore hard for practical use. The models were tested in cases where character recognition appears difficult to a human, but all architectures showed the same ability to reliably recognize such images. According to [22], the InceptionResNetV2 architecture can be called the most efficient model, taking into account the computational complexity, the amount of memory and the ability to recognize distorted symbols. Research on the use of various architectures of neural networks without prior training based on ImageNet is also known. For example, in [23], two variants of convolutional neural networks with different architectures, varying the depth, width, and number of network parameters, were tested for recognition of Devanagari characters. The first model consisted of three convolutional layers and one fully connected layer. The second model came from the LeNet family, and consisted of two convolutional layers followed by two fully connected layers. The best result in terms of recognition accuracy (over 98%) was obtained by the authors with a model with more convolutional layers. A similar result was obtained in [24]. The authors investigated three variants of the architecture of CNN networks: LeNet-5, a modified variant of LeNet and AlexNet CNN. Using the latest version of the neural network architecture, Devanagari character recognition accuracy of 99% was achieved. Numerous experiments with several convolutional neural networks (basic CNN, VGG-16, and ResNet) were conducted in [25] using regularization approaches such as filtering and data augmentation. The VGG and Resnet architectures gave close results in recognition accuracy: using the Resnet architecture, it was possible to achieve the best result with a recognition rate of 98.57%, for the VGG-16 architecture, a result of 97.14% was achieved. The work [26] also noted the higher achieved recognition accuracy when using the deeper architecture of the CNN neural network. But increasing recognition accuracy is achieved only by using input data augmentation. In [27], different CNN architectures were investigated for recognizing the EMNIST dataset. According to [27], using the GoogleNet architecture always gives higher accuracy compared to ResNet18, but requires 2.5-2.9 times more time to train the model. Neural network architectures that use prior learning have been created to classify color images of different sizes. Therefore, for many datasets (e.g., EMNIST Letters 28×28), single-channel images must be converted to three-channel to use existing libraries and pre-training capabilities [27]. In particular, the ResNet module from the tensorflow package requires an input image with a size of at least 32×32×3. When using a modified CNN architecture and training models without loading the weights of the pre-trained model, the input data may contain single-channel images. When comparing variants of color and monochrome image recognition [27], it is indicated that variants with an input image size of 40×40 pixels (for the resized EMNIST data set) in monochrome versions with rotation and shift augmentation have the highest results in the models studied by the authors (ResNet18 and GoogleNet). For the recognition of Cyrillic characters, similar studies are quite few. There is experience in using the MobileNet architecture, which included 30 layers [28] for character recognition of the Kazakh and Russian languages. Some results of Cyrillic character recognition are also presented in [29-30]. Regarding the data set for the recognition of Ukrainian letters, individual works in this direction are known. According to [31], when creating a data set for model training, it is necessary to distinguish between uppercase and lowercase letters, as well as take into account the possibility of different spellings of the same letter. The authors [31] identified more than 70 classes that form a complete set of symbols of the Ukrainian language (for example, different spellings of the lowercase letter "a" were taken into account). 3. Experimental setup and proposed approach There are quite a few studies of handwriting recognition technologies that are based on the use of the EMNIST data set [17] (at least for English). There is known experience of using various classifiers and neural network technologies to recognize Cyrillic alphabet symbols, but comparative studies of recognition technologies for them are fragmentary. Also, there are no EMNIST-like datasets for the Ukrainian alphabet. This article is devoted to researching the possibilities of recognizing Cyrillic (mainly Ukrainian) handwritten letters using convolutional neural networks and analyzing the influence of the selected neural network architecture on the accuracy and reliability of recognition. In addition, the possibility of using a synthetic data set and the effect of augmentation of the original data set on the recognition results were investigated. The goals of this study: • Analysis of the influence of the architecture of convolutional neural networks on the accuracy of recognition of handwritten numbers and letters of the Ukrainian alphabet. • Analysis of the peculiarities of the recognition of Ukrainian symbols under the conditions of learning convolutional neural networks using a synthetic data set with various options for increasing the training sample. 3.1. Building a dataset for model training The dataset used for training the models was built using a set of handwritten and italic fonts (a total of 48 font variants with Ukrainian glyphs were selected). All images of letters and numbers were divided into 76 classes (33 lowercase, 33 uppercase letters and 10 numbers) or 43 classes (33 letters and 10 numbers). All images of the data set were centered and a dataset was created from them with the size of each image 28x28, or 32x32, or 64x64, or 128x128 pixels. The Pillow library was used to create or transform images with letters or numbers (from one-channel to three-channel). The test data set was built using the same fonts. The selection of specific fonts and augmentation options was chosen randomly. The volume of the test dataset was about 10% of the volume of the training one. The presence of only a small number of suitable fonts with Ukrainian glyphs required the use of augmentation to form the necessary data set. We used the capabilities of the Image Data Generator from the tensorflow package to perform three options for transforming character images: random rotation, shift transformation, scaling transformation. The number of generated images varied from 2 to 48 for each symbol. For 32 images per each symbol the total volume of the dataset was 116,736 samples (32 images per character of one font). This sample volume is quite comparable to the EMNIST Letters dataset [12, 17], which contains mixed lowercase and uppercase letters (26 classes and a total of 145,600 samples). 3.2. Preprocessing of images for recognition Tools from the OpenCV library were used to select image regions containing letters or numbers, which were then recognized. The findContours function or the algorithm for extracting the most stable extreme region (mser function) was used to select the contours of recognized symbols. The algorithm for preprocessing the image and selection of the area containing letters or numbers included the following stages: 1. Image filtering to reduce the noise level (the Gaussian filter was used - function cv2.GaussianBlur); 2. Binarization of the image to cut off noise (the cv2.threshold function was used, its parameters were chosen for reliable selection of character contours); 3. Morphological transformation (dilation – function cv2.dilate, several iterations were used); 4. Selection of contours and their sorting (selection of contours was carried out using the function cv2.findContours; 5. Image segmentation, ie. selection of recognition areas as a set of rectangles containing contours of letters and numbers (cv2.boundingRect functions were used). Directly for recognition, selected regions of interest were cut from the original image, binarization was again applied to them, after which the obtained images of individual symbols (without dilation or other distortions) were scaled to the size of the image in the dataset. Each pixel value of the images was in the range 0 to 255, so normalization of these pixel values was performed by dividing by 255 so that all values in the array describing the image were in the range 0 to 1. 3.3. Proposed CNN Architectures At the first stage of the research, the models were trained using single-channel images sized 28x28 pixels. The simplest variants of the architecture of LeNet-type convolutional neural networks for character image recognition are presented in Table 1. More complex neural network architectures are presented in Table 2. Architecture 4 and architecture 5 are implementations of the AlexNet architecture for single-channel images. Architecture 6 included thirteen convolutional layers and three dense layers, as well as MaxPooling and Dropout layers. This version of the architecture is the most complex and repeats the VGG-16 architecture with respect to single-channel images. However, it turned out to be the best in terms of accuracy and reliability of recognizing the test sample and real inscriptions. Table 1 The simplest variants of convolutional neural network architecture Architecture 1 Architecture 2 Architecture 3 Input (28x28x1) Input (28x28x1) Input (28x28x1) conv2d, 128 filters conv2d, 64 filters conv2d, 128 filters conv2d, 128 filters conv2d, 64 filters conv2d, 128 filters MaxPooling2D MaxPooling2D MaxPooling2D Dropout Dropout Dropout Flatten conv2d, 128 filters conv2d, 256 filters Dense, 256 filters conv2d, 128 filters conv2d, 256 filters Dropout MaxPooling2D MaxPooling2D Dense ( output - 76 classes) Dropout Dropout Flatten conv2d, 512 filters Dense, 256 filters conv2d, 512 filters Dropout MaxPooling2D Dense ( output - 76 classes) Dropout Flatten Dense, 1024 filters Dropout Dense ( output - 76 classes) Table 2 Variants of the architecture of convolutional neural networks such as AlexNet and VGG 16 Architecture 4 Architecture 5 Architecture 6 Input (28x28x1) Input (28x28x1) Input (28x28x1) conv2d, 128 filters conv2d, 64 filters conv2d, 128 filters conv2d, 128 filters conv2d, 64 filters conv2d, 128 filters MaxPooling2D MaxPooling2D MaxPooling2D Dropout Dropout Dropout Flatten conv2d, 128 filters conv2d, 256 filters Dense, 256 filters conv2d, 128 filters conv2d, 256 filters Dropout MaxPooling2D MaxPooling2D Dense Dropout Dropout ( output - 76 classes) Flatten conv2d, 512 filters Dense, 256 filters conv2d, 512 filters Dropout MaxPooling2D Dense (output – 76 classes) Dropout Flatten Dense, 1024 filtrs Dropout Dense (output – 76 classes) Architecture 1 included an input layer, one convolution block of two layers, a MaxPooling subsampling layer, a Dropout regularization layer, a dense layer, a Flatten dimensionality transformation layer, another regularization layer, and an output layer. Two more variants of the architecture of convolutional neural networks with an increased number of convolutional blocks are also presented in Table 1 (architecture 2 and architecture 3). They differ from the simplest version by an increase in the number of convolutional blocks from two layers (two blocks in the architecture 2 or three blocks in the architecture 3). At the second stage of research, taking into account the presence of recognition errors even when using the best models, several variants for more complex neural networks architectures were considered. Research has been done with VGG16 and VGG19 [32], ResNet[33] or ResNetV2[34], MobileNet or MobileNetV2 [35, 36], InceptionResNetV2 [37] architectures. Several variants of implemented architectures for ResNetV2 family are shown in Figure 1. Figure 1: Examples of implementation of models with architectures ResNetV2 family 4. Experimental results and discussion An example of the learning results of model with architecture 1 is presented in Figure 2. An example of the results of learning a model with architecture 6 (recognition accuracy and amount of loss depending on the number of learning epochs) is presented in Figure 3. a) model accuracy b) model loss Figure 2: Results of learning the simplest LeNet-type model All the architectures used during training on the maximum size sample provided a recognition accuracy of 95-99% of the sample items. An increase in the number of neural network parameters due to the use of a deeper architecture leads to an increase in recognition accuracy. The calculation time during neural network training increases with the increase in the number of adjustable parameters (when comparing architectures 1 and 6 - approximately an order of magnitude). However, when trying to recognize images with real inscriptions that do not belong to the training or test sample, a significant difference in the behavior of the studied architecture variants was found regarding the possibility of reliable character recognition. A typical example of recognizing an inscription containing letters is given in Table 3. As can be seen from the obtained results, 100% recognition accuracy is provided only by the most complex variant of the architecture (variant 6). An attempt to recognize an inscription containing only numbers gave an even more pronounced result of the accuracy of recognizing an image with isolated numbers, see Table 4. a) model accuracy b) model loss Figure 3: Results of learning the VGG-16-type model Table 3 A sample of the results of an inscription with letters recognition Inscription on the image CNN Architecture Recognized Accuracy score (in Ukrainian) Architecture 1 ДБІ6 50% Architecture 2 Абієв 100% Inscription ( “АБіїв” in Ukrainian) Architecture 3 ДБіїв 80% Architecture 4 ДБіїв 80% Architecture 5 Абієв 100% Architecture 6 Абієв 100% Selection of recognition areas Table 4 A sample of the results of an inscription with numbers recognition Inscription on the image CNN Architecture Recognized Accuracy score (in Ukrainian) Architecture 1 Іг5ц5 20% Architecture 2 Іг3ц5 40% Inscription ( “12345” in Ukrainian) Architecture 3 1г5ц5 40% Architecture 4 123ц5 80% Architecture 5 123ц5 80% Architecture 6 12345 100% Selection of recognition areas Similar results were obtained for many other variants of inscriptions, including those with letters and numbers at the same time: acceptable results in terms of recognition accuracy were obtained for more complex variants of architecture. Recognition errors were obtained on some samples of inscriptions and when using deep architectures. Neural networks for all architectures were trained using the Adam optimizer, the learning rate was chosen to be 0.0001, the number of training epochs was chosen to be 50. The size of the training sample strongly affects the reliability of character recognition. The generation of 1,536 images per letter or number (32 images for each character for 48 font types) is actually the limit for acceptable recognition accuracy. Reducing the sample size leads to a significant decrease in recognition accuracy (from 100% accuracy to 40-60% when the sample size is reduced by 4 times). An increase in the size of the sample leads to a noticeable increase in the time spent on training the model. The use of ResNet or MobileNet architectures required a transition to the formation of a training dataset from three-channel images. It has been established that reliable recognition of various alphanumeric inscriptions for all variants of the model architecture was achieved using a training set of sufficient size. Training a model using three-channel images, especially as the resolution of the training sample increases, is a very resource-intensive process. Therefore, the authors were forced to reduce the number of recognizable classes to 43, abandoning the difference between lowercase and uppercase letters. An example of the recognition result for alphabetic and digital inscriptions is shown in Figure 4. The figure shows the selected areas of interest and recognition results. Figure 4: An example of recognition results using the VGG16 neural network (in this case, all letters and numbers are recognized accurately) Comparing different model architectures, all the options considered showed the recognition accuracy of the test set in the range of 99.2-99.6% when trained on a dataset of sufficient volume. An increase in the number of samples in the training data set for all the considered architectures led to an increase in recognition accuracy. An example of the experimental results for the model with the MobileNet architecture is shown in Figure 5. The recognition accuracy of real inscriptions with an accuracy of 80-90% was achieved with a training sample size of at least 700, and preferably more than 1500 images per class. An example of the experimental results for the model with the MobileNetV2 architecture is shown in Figure 6. Variation of the parameters of the transformations that were used for augmentation also has a noticeable effect on the recognition results: deformation or rotation of the image by more than 10-15% increases the frequency of errors. Increasing the resolution of the training sample images had little effect on the results due to saturation. For example, when training a model with the MobileNetV2 architecture on a dataset with a resolution of 32x32 data, the recognition accuracy of the test dataset was 98%, on a dataset with a resolution of 64x64, respectively, 99%, and on a dataset with a resolution of 128x128 - 99.5% (example is shown in Figure 7: An example of the influence of the training dataset images resolution on the achieved recognition accuracy (MobileNetV2 and ResNet152v2 architecture).Figure 7(a)). However, for other architectures, the result of resolution increase was much less pronounced. The number of errors in recognition of elements of real inscriptions has changed little: for the model with the ResNet152V2 architecture, an increase in the resolution of images of the training sample led to a decrease in the proportion of erroneous recognition from 18.0% to 11.4% (Figure 7 (b)), for models with the MobileNet or MobileNetV2 architecture, it has not practically changed. However, with an increase in the resolution of the training sample, the time spent on training increased quite significantly (by more than an order of magnitude). 100 99,4 99,5 99,2 Values accuracy 99 98,4 98,5 98 97,8 97,5 97 192 384 768 1536 Number of images per class in the training dataset Figure 5: An example of the influence of the size of the training dataset on the achieved recognition accuracy (MobileNet architecture). 30,0 27,3 recognition errors, % The proportion of 25,0 22,7 20,0 15,9 15,0 13,6 13,6 10,0 5,0 0,0 384 192 768 1536 2304 Number of images per class, pcs. Figure 6: Recognition errors of real inscriptions depending on the size of the training dataset (MobileNetV2 architecture, 32х32х3 dataset images). When using deep neural networks to recognize letters or numbers, the reliability of recognition of elements of real inscriptions depended primarily on the size of the training dataset. The recognition accuracy of the test dataset after training all variants of the models was quite high - 97-98% and higher. However, the generation of training datasets of a small size - 300-500 images per class - practically did not provide any reliable recognition. The use of a model with the InceptionResNetV2 architecture, which requires an image resolution in the training set of at least 75x75x3 (actually, the model was trained on 128x128x3 images), did not lead to a noticeable increase in recognition accuracy. In general, when comparing the achieved accuracy of recognition of real images and the speed of training the model, the best performance was provided by models of the ResNetV2 or MobileNetV2 family. Experiments with changing the optimization algorithm from those available in the Tensorflow/Keras package did not give any improvement in the accuracy and reliability of recognition of real samples. Increasing the number of model training epochs above the marked one also did not lead to a change in the results. a) training accuracy (MobileNetV2) b) recognition accuracy (ResNet152V2) Figure 7: An example of the influence of the training dataset images resolution on the achieved recognition accuracy (MobileNetV2 and ResNet152v2 architecture). 5. Conclusions In this work, several variants of the architecture of convolutional neural networks for the recognition of isolated handwritten digits and Ukrainian letters are considered. The results of recognition of various images containing letters and numbers were compared on models with different architectures. It has been established that when training a model on a set of one- dimensional images 28x28, an increase in the number of convolutional layers of a neural network in most cases leads to an increase in the reliability of recognition. Among the options considered, the best accuracy and reliability of recognition was provided by a model with an architecture of the VGG16 type, which included 13 convolutional and three dense layers. The possibility of learning convolutional neural networks using a synthetic data set built on the basis of handwritten or cursive fonts is shown. The size of the training dataset significantly affects the reliability of character recognition. The data sets used in the work contained from 192 to 2304 samples per class. The lower limit of the sample size, which provides acceptable recognition accuracy, was 1536 characters per class. Reducing the sample size by reducing the number of samples per class leads to a significant decrease in recognition accuracy (from 90% recognition accuracy of elements of real inscriptions to 40-60% with a 4-fold decrease in sample size). An increase in the volume of the training data set did not provide an increase in the accuracy and reliability of recognition, but led to a significant increase in the training time of the model An increase in image resolution from 32x32x3 to 128x128x3 of the training dataset in most cases did not lead to an increase in the reliability of real image recognition. 6. References [1] A. Chaudhuri, K. Mandaviya, S. K. Ghosh, P. Badelia, Optical Character Recognition Systems for Different Languages with Soft Computing, volume 352 of Studies in fuzziness and soft computing, Springer, 2017. doi: 10.1007/978-3-319-50252-6. [2] H. Li, P. Wang, C. Shen, Toward end-to-end car license plate detection and recognition with deep neural networks, IEEE Transactions on Intelligent Transportation Systems 20(3) (2018) 1126-1136. doi:10.1109/TITS.2018.2847291. [3] A.Rajavelu, M. T. Musavi, M. V. Shirvaikar, A neural network approach to character recognition, Neural Networks 2(5) (1989) 387–393. doi:10.1016/0893-6080(89)90023-3 [4] J. Bai, Z. Chen, B. Feng, B. Xu, Image character recognition using deep convolutional neural network learned from different languages, in: IEEE International Conference on Image Processing (ICIP), Paris, France, 2014, pp. 2560-2564. doi:10.1109/ICIP.2014.7025518. [5] D. S. Maitra, U. Bhattacharya, S. K. Parui, CNN based common approach to handwritten character recognition of multiple scripts, in: 3th International Conference on Document Analysis and Recognition (ICDAR), 2015, pp. 1021-1025. doi:10.1109/ICDAR.2015.7333916. [6] E. F. Bilgin Taşdemir, Online Turkish Handwriting Recognition Using Synthetic Data, Avrupa Bilim ve Teknoloji Dergisi 32 (2021) 649-656. doi:10.31590/ejosat.1039846. [7] D. Nurseitov, K. Bostanbekov, D. Kurmankhojayev, A. Alimova, A. Abdallah, R. Tolegenov, Handwritten Kazakh and Russian (HKR) database for text recognition, Multimedia Tools Appl. 80 21–23 (2021) 33075–33097. doi:10.1007/s11042-021-11399-6. [8] A. Abdelrahman, M. Hamada, D. Nurseitov, Attention-Based Fully Gated CNN-BGRU for Russian Handwritten Text, Journal of Imaging 6(12) (2020) 141. doi:10.3390/jimaging6120141. [9] Z. Ullah, M. Jamjoom, An intelligent approach for Arabic handwritten letter recognition using convolutional neural network, PeerJ Computer Science (2022) 8:e995. doi:10.7717/peerj- cs.995. [10] D. Jeevitha, S. Muthu, I. Nila, V. Santhoshi, Handwritten Letter Recognition using Artificial Intelligence. International Journal for Research in Applied Science and Engineering Technology, 10 (2022) 2752-2758. doi:10.22214/ijraset.2022.42949. [11] L. Gannetion, K. Y. Wong, P. Y. Lim, K. H. Chang, A.F.L. Abdullah, An exploratory study on the handwritten allographic features of multi-ethnic population with different educational backgrounds, PloS one, 17(10) (2022) e0268756. doi:10.1371/journal.pone.0268756. [12] G. Cohen, S. Afshar, J. Tapson, A. Van Schaik, EMNIST: Extending MNIST to handwritten letters, in: 2017 international joint conference on neural networks (IJCNN), 2017, pp. 2921-2926. doi:10.48550/arxiv.1702.05373. [13] Y. LeCun, B.E. Boser, J.S. Denker, D. Henderson, R.E. Howard, W.E. Hubbard, L.D. Jackel, Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation 1 (1989) 541-551. doi: 10.1162/neco.1989.1.4.541. [14] D. Núñez Fernández, S. Hosseini, Real-Time Handwritten Letters Recognition on an Embedded Computer Using ConvNets, in: 2018 IEEE Sciences and Humanities International Research Conference (SHIRCON), Lima, Peru, 2018, pp. 1-4, doi: 10.1109/SHIRCON.2018.8592981. [15] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet classification with deep convolutional neural networks. Commun, ACM 60 6 (2017) 84–90. doi:10.1145/3065386. [16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S.E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp.1-9. doi:10.1109/CVPR.2015.7298594. [17] A. Baldominos, Y.Saez, P. Isasi, A Survey of Handwritten Character Recognition with MNIST and EMNIST, Appl. Sci. 9(15) (2019) 3169. doi:10.3390/app9153169. [18] B. Mandal, S. Dubey, S. Ghosh, R. Sarkhel, N. Das, Handwritten Indic Character Recognition using Capsule Networks, in: 2018 IEEE Applied Signal Processing Conference (ASPCON), 2018, pp. 304-308, doi:10.1109/ASPCON.2018.8748550. [19] K.S. Yadav, K. Anish Monsley, S.A. Barlaskar, N. Ahmad, R.H. Laskar, M.K. Bhuyan, Recognition of isolated characters across different input interfaces using 2D DCNN, in: TENCON 2021-2021 IEEE Region 10 Conference (TENCON), Auckland, New Zealand, 2021, pp. 504-509, doi: 10.1109/TENCON54134.2021.9707451. [20] K. He, R.B. Girshick, P. Dollár, Rethinking ImageNet Pre-Training, in: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2018, pp. 4917-4926, doi:10.1109/ICCV.2019.00502. [21] W. Albattah, S. Albahli, Intelligent Arabic Handwriting Recognition Using Different Standalone and Hybrid CNN Architectures, Appl. Sci. 12 (2022) 10155. doi:10.3390/app121910155. [22] Tapotosh Ghosh, Min-Ha-Zul Abedin, Hasan Al Banna, Nasirul Mumenin, Mohammad Abu Yousuf, Performance Analysis of State of the Art Convolutional Neural Network Architectures in Bangla Handwritten Character Recognition, Pattern Recognit. Image Anal. 31(1) (2021) 60–71. doi:10.1134/S1054661821010089. [23] A. Bhardwaj, R. Singh, Handwritten devanagari character recognition using deep learning - convolutional neural network (cnn) model, PalArch’s Journal of Archaeology of Egypt/Egyptology, 17(6) (2020) 7965-7984. URL: https://archives.palarch.nl/index.php/jae/article/view/2203. [24] Duddela Sai Prashanth, R. Vasanth Kumar Mehta, Kadiyala Ramana, Vidhyacharan Bhaskar, Handwritten Devanagari Character Recognition Using Modified Lenet and Alexnet Convolution Neural Networks, Wirel. Pers. Commun, 122(1) (2022) 349–378. doi:10.1007/s11277-021-08903-4. [25] Aicha Korichi, Slatnia Sihem, Tagougui Najiba, Zouari Ramzi, Aiadi Oussama, Recognizing Arabic Handwritten Literal Amount Using Convolutional Neural Networks. In: Artificial Intelligence and Its Applications, 2022, pp. 153-165. doi:10.1007/978-3-030-96311- 8_15. [26] Hossam Magdy Balaha, Hesham Arafat Ali, Mohamed Saraya, Mahmoud Badawy, A new Arabic handwritten character recognition deep learning system (AHCR-DLS), Neural Comput. Appl. 33(11) (2021) 6325–6367, doi:10.1007/s00521-020-05397-2 [27] Gibrael Al Amin Abo Samra, Hadi Oqaibi, An Optimized Deep Residual Network with a Depth Concatenated Block for Handwritten Characters Classification, Computers, Materials & Continua 680 (2021) 1-28. doi:10.32604/cmc.2021.015318. [28] D.B. Nurseitov, K. Bostanbekov, M. Kanatov, A. Alimova, A. Abdallah, G. Abdimanap, Classification of Handwritten Names of Cities and Handwritten Text Recognition using Various Deep Learning Models, arXiv preprint arXiv:2102.04816, 2021. doi:10.25046/aj0505114. [29] O. Vovchuk, M. Kyrychenko, Recognition of Handwritten Cyrillic Letters using PCA, 2019. URL: https://www.researchgate.net/publication/336987544_Recognition_of_Handwritten_Cyrillic_ Letters_using_PCA. [30] Cyrillic-oriented MNIST. URL: https://github.com/GregVial/CoMNIST. [31] V. Khavalko, V. Mykhailyshyn, R. Zhelizniak, I. Kovtyk, A. Mazur, Economic efficiency of innovative projects of CNN modified architecture application, in: CEUR Workshop Proceedings. – 2020. – Vol. 2654 : Proceedings of the International workshop on cyber hygiene (CybHyg-2019) co-located with 1st International conference on cyber hygiene and conflict management in global information networks (CyberConf 2019). Kyiv, Ukraine; November 30, 2019, pp. 182–193. URL: https://ceur-ws.org/Vol-2654/paper14.pdf. [32] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, CoRR abs/1409.1556 (2014). doi:10.48550/arXiv.1409.1556. [33] K. He, X. Zhang, S. Ren, J. Sun, (2015). Deep Residual Learning for Image Recognition, in:2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 770-778. doi:10.1109/cvpr.2016.90. [34] Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun, Identity Mappings in Deep Residual Networks, in: European Conference on Computer Vision-2016, Springer, 2016, pp. 630-645. doi:10.1007/978-3-319-46493-0_38. [35] Andrew G.Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, ArXiv abs/1704.04861 (2017), doi:10.48550/arXiv.1704.04861. [36] Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. MobileNetV2: Inverted Residuals and Linear Bottlenecks, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510-4520, doi:10.1109/CVPR.2018.00474. [37] C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, ArXiv abs/1602.07261 (2016), doi:10.1609/aaai.v31i1.11231