<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Lekha. (2010). Data classification using support vector machine.
Journal of Theoretical and Applied Information Technology. 12. 1</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/21.97458</article-id>
      <title-group>
        <article-title>Handwritten  Digits  Recognition  Using  SVM,  KNN,  RF  and  Deep  Learning Neural Networks </article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yevhen Chychkarov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiia Serhiienko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Syrmamiikh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anatolii Kargin</string-name>
          <email>kargin@kart.edu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Donetsk State University of Management</institution>
          ,
          <addr-line>Mariupol</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pryazovskyi State Technical University</institution>
          ,
          <addr-line>Mariupol</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Ukrainian state university of railway transport</institution>
          ,
          <addr-line>Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>21</volume>
      <issue>3</issue>
      <fpage>1</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>   This article discusses several classification algorithms of recognizing numbers from photographic images or with manual input, namely: support vector machine (SVM), Knearest neighbors (KNN), random forest (RF) and several variants of neural networks. The success rates of the algorithms in the field of handwriting recognition were compared. Six variants of recognition technology were analyzed and tested: using classifier from Scikitlearn package and using deep learning neural networks. To construct and train neural networks or train classifiers, a well-known and rather complete base of handwritten digits MNIST was chosen. Two types of neural networks were considered: sequential and convolutional. The training of neural networks was carried out using a variable number of steps (epochs). Recognition images were scaled to a size of 28x28 (784 cells in onedimensional representation). Preliminary processing of images (filtering, scaling, etc.) was carried out using the OpenCV library. For recognition, each image of a digit was converted to a 28x28 size and fed to the input of a pre-trained neural network. A technique to select the area of interest in photographs containing hand-written digits for further recognition has been devised. For handwritten digit recognition, the best recognition accuracy is provided by a convolutional neural network, as 97.6% of car ladle digits were recognized correctly with it. To improve the recognition accuracy for handwritten digits, it is necessary to perform two additional stages of image preprocessing and dataset transformation. After building recognition models using all the algorithms mentioned above, the recognition accuracy of all handwritten digits on the test program turned out to be within 98-100%. For industrial images regardless of the used neural network version, the recognition accuracy was 96-98%.</p>
      </abstract>
      <kwd-group>
        <kwd> 1  Scikit-learn</kwd>
        <kwd>Classifier</kwd>
        <kwd>Keras</kwd>
        <kwd>TensorFlow</kwd>
        <kwd>MNIST</kwd>
        <kwd>Python</kwd>
        <kwd>Deep learning</kwd>
        <kwd>Neural networks</kwd>
        <kwd>Digit recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction </title>
      <p>There are a few examples of technical application of optical number or digit recognition from
photographs, i.e. recognition of railway carriage numbers, car license plates, product marking,
readings of recording devices, etc.</p>
      <p>This article discusses several classification algorithms for recognizing numbers in photographic
images or with manual input, namely: support vector machine (SVM), K-nearest neighbors (KNN),
random forest (RF) and several variants of neural networks. The algorithms success rates in the field
of handwriting recognition were compared. In the methods section of this article, brief information is
given about handwriting recognition and compared machine learning methods. In the experimental
section, the values obtained as a result of the study were compared. Evaluations were made on the
compared machine learning algorithms.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Analysis of literature data and formulation of the problem </title>
      <p>
        Optical Character Recognition (OCR) is an important area of research in artificial intelligence and
character recognition [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For optical character recognition, many applications have been developed
that solve the problems of text information extraction, automatic recognition of car license plates and
railway carriage numbers [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>By now a wide range of studies have been carried out, including a comprehensive study and
implementation of various popular algorithms for recognizing handwritten text, including handwritten
numbers. The task of segmentation and recognition of image areas containing handwritten or printed
characters is relevant due to the presence of numerous technical applications.</p>
      <p>
        For example, license plate recognition in work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or registration and recognition of wagons and
tank numbers in work [
        <xref ref-type="bibr" rid="ref4 ref5">4-5</xref>
        ] are performed using neural network technologies. The functionality of all
these systems is approximately the same – they automate the process of reading numbers and store the
received information.
      </p>
      <p>Character recognition is a classification task that includes recognizing a set of characters in an
image, dividing them into 10 classes in case of numbers or 26 classes in case of the Latin alphabet
letters.</p>
      <p>
        Many systems are currently available for identifying printed text. However, according to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the
identification of handwritten characters is still a problem in the field of pattern recognition. A large
number of research and development dedicated to the OCR system, consider extensive handwritten
digits recognition capabilities.
      </p>
      <p>There are some problems that hinder the implementation of character recognition, namely:
 low quality of photo or scanned copy, but this problem is partially solved by image
preliminary processing.
 presence of distorted characters, especially when working with handwritten documents or
numbers, due to the peculiarities of character style.
 similarity between outlines of some characters.</p>
      <p>
        For example, in work [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the effect of preliminary processing and segmentation of license plate
number images on the results of their recognition was noted. For any recognition methods, incorrect
character segmentation does not allow to achieve accurate results.
      </p>
      <p>
        In works [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7-9</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] the study was focused on the comparison of CNN different models with the
fundamental algorithms of machine learning to different attributes, such as performance, runtime,
complexity, etc., for their explicit evaluation.
      </p>
      <p>
        In work [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the author concluded that the Multilayer Perceptron classifier gave the most accurate
results with minimum error rate followed by Support Vector Machine, Random Forest Algorithm,
Bayes Net, Naive Bayes, j48, and Random Tree respectively. Authors [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] reported a comparison
between SVM, CNN, KNN, RFC and were able to achieve the highest accuracy of 98.72% using
CNN (which took maximum execution time) and lowest accuracy using RFC. In work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] the authors
made a detailed comparison of SVM, KNN &amp; MLP models to classify the handwritten text and
concluded that KNN and SVM predict all the classes of dataset correctly with 99.26% accuracy, but
the process was a little complicated with MLP when it failed classifying number 9, for which the
authors suggested to use CNN with Keras to improve the classification.
      </p>
      <p>
        Improving the accuracy of handwritten digit recognition is achieved by increasing the complexity
of the used deep learning neural networks. For example, in [
        <xref ref-type="bibr" rid="ref6 ref7">6-7</xref>
        ] a convolution neural network for
handwritten digit recognition using MNIST datasets was used. The authors of work [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] used a
convolutional neural network with 7 layers including 5 hidden layers along with gradient descent and
back prorogation model to find and compare the accuracy on different epochs, thereby getting
maximum accuracy of 99.2%. In work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] the same authors briefly discussed different components
of CNN, its advancement from LeNet-5 to SENet and comparisons between different model like
AlexNet, DenseNet and ResNet. The research outputs: the LeNet-5 and LeNet-5 (with distortion)
achieved test error rate of 0.95% and 0.8% respectively on MNIST data set, the architecture and
accuracy rate of AlexNet is the same as LeNet-5, but much bigger with around 4096000 parameters
and “Squeeze-and-Excitation network” (SENet) became a winner of ILSVRC-2017 since they had
reduced the top-5 error rate to 2.25% and by far the most sophisticated model of CNN that exists.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Goal and objectives of the research </title>
      <p>This paper provides a reasonable understanding of machine learning and deep learning algorithms
like SVM, KNN, RF, CNN, and MLP for handwritten digit recognition. Furthermore, it provides
information about the algorithm which is efficient in performing the task of digit recognition. The
related work that has been done in this field followed by the methodology and implementation of all
the algorithms for their better understanding will be discussed in the following sections of this paper.
Next, it presents the conclusion and result. The last section of this paper contains used citations and
references.</p>
      <p>Goals of this research:
1. Evaluation of recognition accuracy for machine learning and deep learning algorithms such as
SVM, KNN, RF, CNN and MLP in relation to real sets of handwritten numbers.
2. Analysis of algorithms influence of an image preliminary processing on recognition accuracy.
3. Evaluation of the possibilities to use the considered algorithms for solving technical problems
associated with the processing of handwritten digits noisy images (on the example of recognizing
cast iron ladle numbers).</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology </title>
    </sec>
    <sec id="sec-5">
      <title>4.1. Equipment and dataset </title>
      <p>The comparison of the algorithms (Support vector machines, KNN, Random Forest, Multi-layered
perceptron &amp; Convolutional neural network) is based on the characteristic chart of each algorithm on
common grounds like dataset, the number of epochs, complexity of the algorithm, accuracy of each
algorithm, specification of the device (Ubuntu 20.10, i5 9th gen processor, 8GB memory) used to
execute the program and runtime of the algorithm under ideal condition.</p>
      <p>
        To build and train the model, a well-known and fairly complete base of handwritten digits MNIST
was selected [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13-15</xref>
        ]. This database (MNIST - Modified NIST) contains a total of 70,000 handwritten
images of numbers and is part of the larger NIST database [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which contains handwritten images
segmented with images of specially prepared templates populated by respondents from the Census
Bureau and students of the US educational institutions. In the MNIST database, images from different
authors have been placed in different parts to enhance uniqueness.
4.2.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Algorithms and methods used </title>
      <p>
        The Support Vector Machine (SVM) was first proposed by Vapnik and since then has attracted a
high degree of interest in the machine learning research community [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. SVM is a supervised
machine learning algorithm. During the training, SVM learns the relationship of each data and tag in
the existing training set. Typically, data items are placed in n-dimensional space, where n is the
number of features. A particular coordinate represents the value of the feature. Points are classified by
finding a hyperplane that distinguishes between two classes. The algorithm chooses a hyperplane that
separates the classes correctly. SVM chooses extreme vectors that help in creating the hyperplane.
These extreme cases are called support vectors, and hence the algorithm is termed as Support Vector
Machine. There are mainly two types of SVMs, linear and non-linear SVM [18]. Kernel function
selection is an important step in the process of SVM to solve a problem [19]. SVM is successful in
solving classification problems compared to many other techniques.
      </p>
      <p>Decision trees form a classification model as a tree structure for the solution of a problem. The tree
structure and rules are easy to understand. This simplifies the implementation of the algorithm.
Decision trees method consists of simple sequential decision making operations. One of the most
important steps in creating the tree structure is choosing the attribute value for which the branching in
the tree will be determined [20,21].</p>
      <p>A multilayer perceptron is a class of feedforward artificial neural networks that consists of at least
three layers: input, hidden, and output. Except for the input neurons, all neurons use a non-linear
activation function.</p>
      <p>The number of hidden layers can be increased to any number according to the problem, without
limitation on the number of nodes. The specific number of hidden layers or the number of nodes in a
hidden layer is difficult to determine due to the unstable nature of the model and is therefore chosen
experimentally. Each hidden layer in the model can have different activation functions for processing.
For teaching purposes, it uses a supervised learning method called backpropagation. In MLP, a node
connection consists of a weight that is tuned to synchronize with each connection as the model is
trained. [22]</p>
      <p>KNN is one of the classification methods. Nearest neighbor searching is the following problem:
We are given a set S of n data points in a metric space, X, and the task is to preprocess these points so
that, given any query point q  X, the data point nearest to q can be reported quickly. This is also
called the closest-point problem and the post office problem. Nearest neighbor searching is an
important problem in a variety of applications [23 The KNN algorithm is used to assign a new
observation to the one of the classes that is most common among the k neighbors of a given element,
the classes of which are already known. Classification is made according to the threshold value
determined with the average of the k data that appears to be the closest. The performance of the
method is influenced by the closest neighbor number, threshold value, similarity measurement and
sufficient number of normal behaviors in the learning cluster [24-25].</p>
      <p>Random forest algorithm can be used in both classification and regression problems like decision
trees. The logic of work is to create more than one decision tree and produce average results with the
help of these trees. The reason why this algorithm is called random is that it offers extra randomness
during the creation of the tree structure. When splitting a node, instead of looking for the best attribute
directly, it looks for the best attribute in a subset of random attributes. This situation creates more
diverse trees [21, 26-27].</p>
      <p>Deep convolutional neural networks (CNNs) are a specialized kind of ANNs that use convolution
in place of general matrix multiplication in at least one of their layers [28]. CNN is a deep learning
algorithm that is widely used for image recognition and classification. Unlike neural networks with a
simpler architecture, which have one or more hidden layers, CNNs are composed of many layers.
Such a feature allows them to compactly represent highly nonlinear and varying functions [29]. CNNs
involve many connections, and the architecture is typically comprised of different types of layers,
including convolution, pooling and fully connected layers, and realize form of regularization [30]. In
order to learn complicated features and functions that can represent high-level abstractions (e.g., in
vision, language, and other AI-level tasks), CNNs would need deep architectures. Deep architectures,
and CNNs, consist of a large number of neurons and multiple levels of latent calculations of
nonlinearity. According to [31], each level of CNN architecture represents features at a different level of
abstraction defined as a composition of lower-level features.</p>
      <p>CNN uses a filter (kernel) which is an array of weights to extract features from the input image.
CNN employs different activation functions at each layer to add some non-linearity [32].</p>
      <p>Many works have noted the importance of image preprocessing for recognizing handwritten
numbers or letters.</p>
      <p>In particular, [33] applied various preprocessing methods to improve the performance of CNN
models. Translations, rotations and elastic deformations have been studied for a variety of in the area
of machine learning goals [34-36].</p>
      <p>Acording to the authors [33], preprocessing with the combination of elastic and rotation improves
the accuracy of the three analyzed networks up to 0.71%.</p>
      <p>According to [37], many of today’s OCR systems are built following traditional approaches to
image processing and work great with printed text but if use them for handwritten text recognition in
images it can get unexpected results with poor recognition quality.</p>
      <p>The quality of a learned system is primarily dependent of the size and quality of the training set. In
work [36] was proposed a simple technique for vastly expanding the training set on base of elastic
distortions. In [38] the MINST dataset size was increased by four times by using affine
transformations and the prior knowledge of transform invariant properties. In the raised method, the
elastic distortions are applied to each sample of the training set to extend nine new samples.</p>
      <p>These distortions improve the results on MNIST substantially.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Implementation and Computer experiment results </title>
      <p>To compare the algorithms based on working accuracy, execution time, complexity, and the
number of epochs (in deep learning algorithms) this paper used three different classifiers: Support
Vector Machine Classifier, KNN Classifier, Random Forest Classifier, Multilayer Perceptron
Classifier.</p>
      <p>This set of algorithms was implemented using the Scikit-Learn package in the Python
programming language [38].</p>
      <p>More complicated Multilayer Perceptron Classifier and Convolutional Neural Network Classifier
were implemented using the TensorFlow with Keras frontend package in the Python programming
language.</p>
      <p>The Keras library contains numerous implementations of widely used building units of neural
networks, such as layers, target and transfer functions, optimizers, and many tools to simplify the
work with images and text.
5.1.</p>
    </sec>
    <sec id="sec-8">
      <title>Pre‐Processing of Images </title>
      <p>To select areas of images containing recognizable digits, we used OpenCV library tools. The
findContours function or the Maximally stable extremal region extractor (mser) algorithm were used
to highlight the contours of the digits.</p>
      <p>To recognize numbers in photographs of real cast iron ladle cars, an algorithm based on boundary
detection was used [39]. The algorithm for preprocessing the image and highlighting the area
containing the digits of the number included the following stages:
1. image filtering to reduce the noise level (a Gaussian filter was used - cv2.GaussianBlur
function);
2. binarization of the image to cut off noise (the cv2.threshold function was used, its parameters
were adapted to reliably select the outlines of the digits);
3. highlighting the contrasting borders on the image (the Canny edge detector was used
cv2.Canny function, for which the values of the maximum and minimum values of the gradient
were preliminarily selected);
4. filtering to reduce the effect of image heterogeneity (the median filter was used
cv2.medianBlur function);
5. image binarization (cv2.threshold function was also used);
6. morphological transformation (dilatation - function cv2.dilate);
7. selection of contours and their sorting (selection of contours was carried out using the
cv2.findContours function);
8. image segmentation, i.e. selection of recognition areas in the form of rectangle set containing
previously selected contours of digits (cv2.boundingRect functions were used).</p>
      <p>Recognition images to be scaled to a size of 28x28 (784 cells in a one-dimensional representation).
Each pixel value of the images lies between 0 to 255 followed by Normalizing these pixel values by
converting the dataset into 'float32' and then dividing by 255.0 so that the input features will range
between 0.0 to 1.0. Next, one-hot is performed encoding to convert the y values into zeros and ones,
making each number categorical.
5.2.</p>
    </sec>
    <sec id="sec-9">
      <title>Implementation of the considered algorithms </title>
      <p>Classical classification algorithms were implemented on the basis of the package Scikit-learn [40].
It is an open source machine learning library that supports supervised and unsupervised learning. It
also provides various tools for model fitting, data pre-processing, model selection and evaluation, and
many other utilities.</p>
      <p>To implement the Support Vector Machine algorithms, the module sklearn.svm was used. The
support vector machines in scikit-learn support both dense (numpy.ndarray and convertible to that by
numpy.asarray) and sparse (any scipy.sparse) sample vectors as input. However, to use an SVM to
make predictions for sparse data, it must fit for such data. For optimal performance, use C-ordered
numpy.ndarray (dense) or scipy.sparse.csr_matrix (sparse) with dtype=float64.</p>
      <p>For binary and multiclass dataset classification, scikit-learn implements the SVC, NuSVC, and
LinearSVC classes. SVC and NuSVC are similar methods, except for slightly different sets of
parameters and they have different mathematical formulations (see section Mathematical
formulation). On the other hand, LinearSVC is another (faster) implementation of Support Vector
Classification for the case of a linear kernel.</p>
      <p>The implementation of C-Support Vector Classification is based on libsvm. The fit time scales at
least quadratically with the number of samples and may be impractical beyond tens of thousands of
samples.</p>
      <p>The tuning of the classifier was carried out by choosing the type of kernel, the regularization
parameter and Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’ kernel types.</p>
      <p>A random forest classifier is a meta estimator that fits a number of decision tree classifiers on
various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control
over-fitting. The sub-sample size is controlled by the max_samples parameter if bootstrap=True
(default), otherwise the whole dataset is used to build each tree. The adjustment of this classifier was
carried out by selecting the optimal value of max_samples parameter (it is a number of trees in the
forest).</p>
      <p>Scikit-learn package implements two different nearest neighbors classifiers: KNeighborsClassifier
implements learning based on the k-nearest neighbors of each query point, where k is an integer value
specified by the user. RadiusNeighborsClassifier implements learning based on the number of
neighbors within a fixed radius r of each training point, where r is a floating-point value specified by
the user.</p>
      <p>The k-neighbors classification in KNeighborsClassifier is the most commonly used technique. The
optimal choice of the value k is highly data-dependent: in general, a larger one suppresses the effects
of noise, but makes the classification boundaries less distinct.</p>
      <p>In this work, KNeighborsClassifier was used with the choice number of neighbors required for
each sample.</p>
      <p>The original Multilayer perceptron was also implemented using scikit-learn package. As the
experience of setting up MLPClassifier showed, the following parameters have the main influence on
the accuracy of character recognition:
 hidden_layer_sizes represents the number of neurons in the i-th hidden layer;
 solver represents the solver for weight optimization;
 learning_rate_initdouble represents the initial used learning rate. It controls the step-size in
updating the weights (only used when solver=‘sgd’ or ‘adam’).</p>
      <p>It was found that a result acceptable in terms of accuracy is achieved when choosing a sufficiently
large number of neurons in the hidden layers, sgd or adam optimizers, and a moderate initial training
speed.</p>
      <p>All variants of the classifier models after tuning on the MNIST dataset were saved using the joblib
module.</p>
      <p>The other variant of Multilayer perceptron was implemented using TensorFlow package with
Keras interface.</p>
      <p>The Keras library contains numerous implementations of widely used building units of neural
networks, such as layers, target and transfer functions, optimizers, and many tools to simplify the
work with images and text.</p>
      <p>The model for recognizing digits included the input and output layers, as well as one or more
hidden layers. The model was trained using a variable number of steps (epochs). Recognition images
to be scaled to a size of 28x28 (784 cells in one-dimensional representation), so the number of
neurons in the input and hidden layers was assumed to be 784.</p>
      <p>The output level is a layer with 10 nodes tf.nn.softmax, which returns an array of ten probability
estimates, the sum of which is 1. Each node contains an estimate that indicates the likelihood that the
current image belongs to one of 10 classes.</p>
      <p>For recognition, each image of the digit was converted to a size of 28x28, and then fed to the input
of a pre-trained neural network.</p>
      <p>When setting up the model according to the MNIST test data or a set of images for further
recognition, it was found that the recognition accuracy increases slightly with the increase in the
number of hidden layers. The model setup is much more sensitive to the choice of the type and
parameters of the optimizer and the number of training epochs.</p>
      <p>The implementation of handwritten digit recognition by Convolutional Neural Network was done
using TensorFlow and Keras.</p>
      <p>Variants of structures of the deep learning neural network based on Keras framework, which were
used to recognize car number elements, are shown in Fig. 1.</p>
      <p>а) sequential neural b) convolutional neural network (CNN)
network (feed-forward
network - FNN) with three</p>
      <p>dense inner layers
Figure 1: Structures variants of the deep learning neural network based on Keras framework, which 
were used to recognize car number elements </p>
      <p>A simpler version of a fully connected neural network (Fig. 1, a) in the test set provided a
recognition error level of 1.3-1.5%, but in real examples, the recognition accuracy did not exceed
70% [33].</p>
      <p>A more accurate digit recognition of areas of interest was achieved using a convolutional neural
network (CNN, see Fig. 1b). The disadvantage of this type of networks is a significantly longer
duration of training.</p>
      <p>In the sample of 15 ladle car numbers, which included a total of 42 digits (12 three-digit and 3
two-digit numbers), 41 digits were correctly recognized using a convolutional neural network, so that
the recognition accuracy was 97.6%. Recognition errors are connected with poorly written numbers.
n
o
iit
n
g
o
c
e
r
0,98
0,96
tcceau scsea 00,,9924
r
a
f
o
itno 0,9
rcaF 0,88</p>
      <p>The stages of constructing and training a model and recognition of practical samples are easily
separated due to the possibility of exporting models. To do this, the model.save method (filepath) was
used to save the Keras model in one HDF5 file, which contains:
 architecture of the model, allowing to restore the model;
 model weight;
 training configuration (loss calculation function, optimizer)
 state of the optimizer, which allows to resume training exactly where it was stopped.</p>
      <p>To restore the model, the function keras.models.load_model (filepath) was further used. This
function also allows to build a model using the saved training configuration (if the model has never
been compiled).</p>
      <p>However, when using customized models for recognizing handwritten numbers for other technical
objects – carriage numbers, photographs of meter readings, etc., for all model variants (including
those based on CNN), the number of errors increased sharply – up to 40-50%, depending on the
dataset.</p>
      <p>With proper tuning of all the models mentioned above (from KNN to CNN – as the complexity
grows), the accuracy of the MNIST test sample estimate was 97.5-98.5%, which is slightly inferior to
the best results achieved using preconfigured or convolutional neural networks.</p>
      <p>The results of the SVC classifier as part of the scikit-learn package are shown in Fig. 2.
1 0,984
rbq
linear
poly</p>
      <p>0,982
a) comparison of recognition results for different b) comparison of recognition results
kernels (kernels «rbq», «linear», «poly» were used ) for different values of the regularization
parameter
Figure 2: The results of the recognition accuracy evaluation for SVC classifier with different calibration 
parameters </p>
      <p>As can be seen from the results shown in Fig. 2, the best accuracy of image recognition from the
MNIST sample using the SVC classifier is achieved for the «rbf» kernel and a regularization
parameter of at least 50.</p>
      <p>Similar studies were carried out for other classification options - KNN Classifier, Random Forest
Classifier, Multilayer Perceptron Classifier. The results of the analysis of the influence of the tuning
parameters of the KNN Classifier and Random Forest (RF) Classifier classifiers are shown in Fig. 3.</p>
      <p>As can be seen from fig. 3, a decrease in the number of nearest neighbors for the KNN Classifier
or an increase in the number of trees in the forest for RF Classifier leads to an increase in recognition
accuracy. However, both of the considered classification algorithms do not have any reserves for
improving the recognition accuracy by varying the settings.</p>
      <p>The settings of the multilayer perceptron had a great influence on the recognition accuracy. The
results of the computational experiment are shown in Fig. 4. As can be seen from the graphs
presented, the recognition accuracy of digit samples from the test sample increases with an increase in
the number of neurons in the hidden layers (it was assumed to be the same for all layers) and with an
increase in the number of hidden layers.</p>
      <p>The graphs in Fig. 4 are built using the MLP Classifier method from the scikit-learn package. For a
similar classifier built using the Keras package, similar results were obtained for the effect on the
recognition accuracy of the number of hidden layers and the number of neurons in a layer.
0,974 0,975
0,972
a) KNeighborsClassifier results b) Random Forest Classifier results
Figure  3:  The  results  of  the  recognition  accuracy  evaluation  for  KNeighborsClassifier  and      Random 
Forest Classifier with different calibration parameters 
trea se
ccu csa
a n
fo iito
ion gno
t
rca rce
F</p>
      <p>0,99
0,985</p>
      <p>0,98
0,975</p>
      <p>0,97
0,965
0
200
400
600</p>
      <p>800</p>
      <sec id="sec-9-1">
        <title>Number number of neurons in the hidden layers</title>
      </sec>
      <sec id="sec-9-2">
        <title>Number of hidden layers: 1 4 8</title>
        <p>Figure  4:  The  results  of  the  recognition  accuracy  evaluation  for  MLP  Classifier  (scikit‐
learning package) with different parameters </p>
        <p>When testing various recognition models, some features of the MNIST dataset were established
(fig. 5):
 all images of numbers have a clearly defined border area;
 images are grayscale, the brightness of pixels varies across the width of the digit image;
 the images of the numbers are centered in relation to the 28x28 area.</p>
        <p>Figure 5: Sample images from the MNIST database </p>
        <p>However, the area of interest images for recognition are actually black and white, either due to the
binarization of the image at a preprocessing stage or are initially black and white.</p>
        <p>To improve the recognition accuracy, it is necessary to perform two additional stages of image
preprocessing:
 - after highlighting the area of interest contour exactly along the boundaries of the digit,
this part of the image is centered in the square area;
 - the border of the image is added with a width of 15-25% of the size of the square area.</p>
        <p>The width of the border area significantly affects the reliability of digit recognition (both printed
and handwritten).</p>
        <p>Results of the study of influence of the added width of the boundary region in the recognition
accuracy set of handwritten characters 0-9 are shown at Fig. 6. A computational experiment for
constructing this curve was performed using Scikit-learn implementation of the SVC algorithm.</p>
        <p>An analysis of the effect of the width of the added boundary region was carried out for variants of
digit recognition using neural networks. The shape of the curve in Fig. 6 for variants with a multilevel
perceptron or a CNN network remained exactly the same as for the classification algorithms. An
example of recognizing a set of handwritten digits 0-9 is shown in Fig. 7.</p>
        <p>Border width 20% of the maximum image size. Recognition accuracy is 100%.
b) Border width 50% of the maximum image size. Recognition accuracy 80% (recognition errors
is indicated by arrows).</p>
        <p>Figure 7: Examples of recognition results with CNN deep learning network for a group of handwritten 
digits with different widths of the added border area  </p>
        <p>To assess the reasons of the recognition failure, a test program in Python was developed that
allows to visually draw numbers and monitor the state of the recognition area and the result (Fig. 8).</p>
        <p>Figure 8: Appearance of the program window for testing handwritten character recognition </p>
        <p>After building recognition models using all the algorithms mentioned above, the recognition
accuracy of all handwritten digits on the test program turned out to be in the range of 98-100% (one
or no errors per 50 drawn digits).</p>
        <p>Another reserve for improving the recognition accuracy is setting up the base for building models
to work with black and white images. For this, images from the MNIST dataset were converted to
black and white.</p>
        <p>The results obtained are largely related to the peculiarities of highlighting areas of interest
containing digits in the image. The selection of the contour (the findContours function in OpenCV) is
carried out exactly along the border of the founded contour. The selected area in the form of a
rectangle described around the contour of the digit, without region of interest additional processing,
noticeably differs from the base for recognition (MNIST).</p>
        <p>A similar result has been achieved in industrial images. Regardless of the used neural network
version, the recognition accuracy was 96-98%. For elements of the ladle car number binarization and
morphological “closing” operations (MORPH_CLOSE in terms of the OpenCV library) were
performed before recognition.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>6. Conclusion </title>
      <p>This research paper has implemented some models namely Support Vector Machine Classifier,
KNN Classifier, Random Forest Classifier, Multilayer Perceptron Classifier, Multi-Layer Perceptron
and Convolutional Neural Network for handwritten digit recognition using MNIST datasets. It
compared them based on their working accuracy.</p>
      <p>1. After a simple setup, all these algorithms demonstrated almost the same accuracy of
handwritten digit recognition, differing within +1%.
2. It was found that CNN gave the most accurate results for handwritten digit recognition, but the
only drawback is that it took an exponential amount of computing time.
3. To improve the recognition accuracy, it is necessary to perform two additional stages of image
preprocessing and dataset transformation:
 after highlighting the area of interest contour exactly along the boundaries of the digit, this
part of the image is centered in the square area;
 the border of the image is added with a width of 15-25% of the size of the square area;
 converting images from the MNIST dataset to black and white form.</p>
      <p>After building recognition models using all the algorithms mentioned above, the recognition
accuracy of all handwritten digits on the test program turned out to be in the range of 98-100% (one
or no errors per 50 drawn digits). A similar result was obtained when recognizing the generated sets
of digits with different shapes – the recognition accuracy reaches the recognition accuracy on the
MNIST test sample. For industrial images regardless of the used neural network version, the
recognition accuracy was 96-98%.
7. References </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Pramanik</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bag</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , (
          <year>2018</year>
          ).
          <article-title>Shape decomposition-based handwritten compound character recognition for Bangla OCR</article-title>
          .
          <source>Journal of Visual Communication and Image Representation</source>
          ,
          <volume>50</volume>
          ,
          <fpage>123</fpage>
          -
          <lpage>134</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.jvcir.
          <year>2017</year>
          .
          <volume>11</volume>
          .016
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Sarkhel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saha</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nasipuri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , (
          <year>2016</year>
          ).
          <article-title>A multi-objective approach towards cost effective isolated handwritten Bangla character and digit recognition</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>58</volume>
          ,
          <fpage>172</fpage>
          -
          <lpage>189</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.patcog.
          <year>2016</year>
          .
          <volume>04</volume>
          .010
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>González</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergasa</surname>
            <given-names>L. M.</given-names>
          </string-name>
          , (
          <year>2013</year>
          ).
          <article-title>A text reading algorithm for natural images</article-title>
          .
          <source>Image and Vision Computing</source>
          .
          <volume>31</volume>
          ,
          <issue>3</issue>
          ,
          <fpage>255</fpage>
          -
          <lpage>274</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.imavis.
          <year>2013</year>
          .
          <volume>01</volume>
          .003
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Sanver</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          , (
          <year>2014</year>
          ).
          <article-title>Identification of train wagon numbers</article-title>
          .
          <source>In: Proceedings of the 2014 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference, St. Petersburg</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>63</fpage>
          -
          <lpage>68</lpage>
          . doi:
          <volume>10</volume>
          .1109/ElConRusNW.
          <year>2014</year>
          .
          <volume>6839203</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Lisanti</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,ꞏKaraman,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ,ꞏPezzatini,
          <string-name>
            <surname>D.</surname>
          </string-name>
          ,ꞏDel Bimbo,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , (
          <year>2015</year>
          ).
          <article-title>A Multi-Camera Image Processing and Visualization System for Train Safety Assessment</article-title>
          . https://arxiv.org/pdf/1507.07815.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Surekha</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurudath</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prithvi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ritesh</given-names>
            <surname>Ananth</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.G.</surname>
          </string-name>
          , (
          <year>2018</year>
          ).
          <article-title>Automatic license plate recognition using image processing and neural network</article-title>
          .
          <source>ICTACT journal on image and video processing</source>
          ,
          <volume>08</volume>
          ,
          <issue>04</issue>
          ,
          <fpage>1786</fpage>
          -
          <lpage>1792</lpage>
          . doi:
          <volume>10</volume>
          .21917/ijivp.
          <year>2018</year>
          .
          <volume>0251</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Shamim</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miah</surname>
            , Md Badrul, Sarker, Angona, Rana, Masud, Jobair,
            <given-names>Abdullah.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Handwritten Digit Recognition Using Machine Learning Algorithms</article-title>
          .
          <source>Indonesian Journal of Science and Technology. 18. 10</source>
          .17509/ijost.v3i1.
          <fpage>10795</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Dutt</surname>
          </string-name>
          , Anuj, Dutt, Aashi, (
          <year>2017</year>
          ).
          <article-title>Handwritten Digit Recognition Using Deep Learning</article-title>
          ,
          <source>International Journal of Advanced Research in Computer Engineering &amp; Technology (IJARCET)</source>
          .
          <volume>6</volume>
          ,
          <issue>7</issue>
          ,
          <fpage>990</fpage>
          -
          <lpage>997</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Norhidayu binti Abdul Hamid</surname>
          </string-name>
          , Nilam Nur Binti Amir Sharif, (
          <year>2017</year>
          ).
          <article-title>Handwritten recognition using SVM, KNN, and Neural networks</article-title>
          , https://arxiv.org/pdf/1702.00723.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Siddique</surname>
          </string-name>
          , Fathma, Sakib, Shadman, Siddique, Abu Bakr. (
          <year>2019</year>
          ).
          <article-title>Recognition of Handwritten Digit using Convolutional Neural Network in Python with Tensorflow and Comparison of Performance for Various Hidden Layers</article-title>
          . https://www.researchgate.net/publication/335908757_Recognition_
          <article-title>of_Handwritten_Digit_using _Convolutional_Neural_Network_in_Python_with_Tensorflow_and_Comparison_of_Performan ce_for_Various_Hidden_Layers</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Farhana</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abu</surname>
            <given-names>Sufian</given-names>
          </string-name>
          , Dutta,
          <string-name>
            <surname>P.</surname>
          </string-name>
          ,
          <article-title>Advancements in Image Classification Using Convolutional Neural Network</article-title>
          . https://arxiv.org/pdf/
          <year>1905</year>
          .03288.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          , (
          <year>2017</year>
          ).
          <article-title>Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images</article-title>
          .
          <source>EJNMMI Research 7:11 DOI 10</source>
          .1186/s13550-017- 0260-9. https://arxiv.org/pdf/1702.02223.pdf
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Arif</surname>
            ,
            <given-names>R. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siddique</surname>
            ,
            <given-names>M. A. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>M. M. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oishe</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          , (
          <year>2018</year>
          ).
          <article-title>Study and Observation of the Variations of Accuracies for Handwritten Digits Recognition with Various Hidden Layers and Epochs using Convolutional Neural Network</article-title>
          .
          <source>In: 4th International Conference on Electrical Engineering and Information &amp; Communication Technology (iCEEiCT)</source>
          , Dhaka, Bangladesh,
          <year>2018</year>
          . pp.
          <fpage>112</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Siddique</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sakib</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siddique</surname>
            ,
            <given-names>M.A.B.</given-names>
          </string-name>
          , (
          <year>2019</year>
          ).
          <article-title>Handwritten Digit Recognition using Convolutional Neural Network in Python with Tensorflow and Observe the Variation of Accuracies for Various Hidden Layers</article-title>
          .
          <source>Preprints</source>
          <year>2019</year>
          ,
          <volume>2019030039</volume>
          . doi:
          <volume>10</volume>
          .20944/preprints201903.
          <fpage>0039</fpage>
          .
          <year>v1</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <article-title>The MNIST database of handwritten digits</article-title>
          . http://yann.lecun.com/exdb/mnist.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Grother</surname>
            ,
            <given-names>P.J.</given-names>
          </string-name>
          ,
          <article-title>Nist special database 19 - handprinted forms and characters database // National Institute of Standards and Technology (NIST)</article-title>
          ,
          <source>Tech. Rep</source>
          .
          <year>1995</year>
          . https://www.nist.gov/srd/nistspecial-database-
          <volume>19</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <article-title>The Nature of Statistical Learning Theory</article-title>
          . NY: Springer-Verlag (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>