Comparison of Neural Network and KNN classifiers, for
recognizing hand-written digits
Iwo Różyckia , Adam Wolszlegera
a Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44-100 Gliwice


                                          Abstract
                                          This paper presents a comparison between two different types of classifiers, neural Network and KNN (k nearest neighbors)
                                          . The object is to decide which one better handles recognising numbers, from zero to ten, that have been handwritten. In
                                          this project we designed a program that analyses data from a database, and feeds this data to the classifiers, which in turn
                                          analyse the input from the user and make a prediction based on the information gained from the training data. The aim is
                                          to see whether there are any significant differences when it comes to the accuracy and speed of the learning processes. The
                                          comparison will be made based on the accuracy of the predictions with different configurations and parameters. Both these
                                          classifiers work in a different way and can be used efficiently for different purposes and this paper will present data that will
                                          help decide whether either of these is better suited in this particular area.

                                          Keywords
                                          Hand-written digits, Handwritten signature verification, Handwritten character recognition, Neural networks


1. Introduction                                                                                                    is true.
                                                                                                                      The idea behind it, is that it can calculate straight-
Artificial neural networks are computing systems, whi-                                                             line distance, also known as Euclidean distance, be-
ch are based on the anatomy of the human brain. The                                                                tween the input data and all that data that is available
brain is made up of billion of neurons, which are inter-                                                           and which represents all the possible classes, that the
connected by quadrillions of connections, called syn-                                                              input can be classified as. It therefore relies on this
apses, which allow the information to flow between                                                                 data set to be complete and would be unable to recog-
the neurons. Currently, because of the huge network                                                                nize objects beyond what is already known to it.
of connections, the human brain exceeds any super-                                                                    There are many aproahes to classify text, by ranking
computer in terms of processing power. Its structure                                                               evaluation [8], convolutional features extraction [9] or
can be used to create artificial networks, which use                                                               devoted benchmark [10]. This project aims to compare
artificially created neurons and synapses, which are                                                               these classifiers in the area of recognizing handwrit-
able to process information and make predictions [1,                                                               ten digits. This is an important area, as recognition
2, 3] and guesses based on the knowledge gained [4,                                                                of handwriting is used in a variety of modern devices.
5]. This process is known as Deep Learning and it is                                                               We will test different configurations of parameters for
widely used today in voice recognition, driver less cars                                                           both neural networks and KNN and hopefully see sig-
and can be found in devices we use every day, such as                                                              nificant differences in performances in terms of accu-
smart phones or TVs [6]. In the process of deep learn-                                                             racy and speed.
ing, a computer model learns classification, directly
from data, in the form of images, sound or text. A well
constructed model can even exceed human ability.
                                                                                                                   1.1. Related works
   KNN, or k nearest neighbors is a much simple algo-                                                              These two classifiers have been compared in different
rithm, that classifies new data based on a similarity                                                              articles and areas of research. In one of them, they
measure, usually a distance function, to the already                                                               have been used to monitor the conditions of machines,
stored available cases [7]. The assumptions made by                                                                by analyzing the vibration of journal-bearings. In this
these algorithm ,is that similar things exist close to                                                             research another technique, which is more commonly
each other and it is most useful when that assumption                                                              used in this area, was also used for comparison. At
                                                                                                                   the end, the research has concluded that neural net-
SYSTEM 2020: Symposium for Young Scientists in Technology,                                                         work performed better in this particular task. Another
Engineering and Mathematics, Online, May 20 2020                                                                   research, that will be mentioned is one where these
" iworozy581@student.polsl.pl (I. Różycki);
                                                                                                                   models where compared at classifying magnetic reso-
adamwol165@student.polsl.pl (A. Wolszleger)
                                                                                                                  nance images (MRI). This is a very important area of
                                    © 2020 Copyright for this paper by its authors. Use permitted under Creative
                                    Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                   research, as MRI is not always easy to interpret by a
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
                                                                   As seen on the graph the function only exists be-
                                                                tween 0 and 1 and it is used in models, that predict
                                                                probability as their output. This makes it ideal for this
                                                                project, as the aim is to produce a probability with
                                                                which the input resembles one of the digits. For the
                                                                back propagation algorithm we need to calculate the
                                                                error of the neurons to correctly modify the weights
                                                                of the synapses. For the last layer of neurons the fol-
                                                                lowing equation was used:

                                                                      Δ𝑊 = 𝐿𝑅 ⋅ (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 − 𝑜𝑢𝑡𝑝𝑢𝑡) ⋅ 𝑓 ′ (𝑥) ⋅ 𝑠𝑗    (4)

                                                            Here Δ𝑊 ′ represents the change in weight, ‘LR’ is
Figure 1: Sigmoid function                             the learning rate, which is chosen for the network,
                                                       𝑓 ′ (𝑥) is the derivative of the activation function and
                                                       𝑠𝑗′ is the input from the j-th neuron from the previ-
human, and using automated analysis would increase ous layer. We have defined a gradient of error for each
the probability of correct diagnosis. A special tech- neuron to help us with the above equation. For the last
nique was used on the images, to extract most impor- layer it is calculated as follows:
tant features, which were then fed into a feed forward
back propagation neural network and a KNN classi-                  𝑔𝑟𝑎𝑑 = 𝑓 ′ (𝑥) ⋅ (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 − 𝑜𝑢𝑡𝑝𝑢𝑡)      (5)
fier. In this particular research KNN came out on top
                                                            For the other layers:
in terms of accuracy [11, 12, 13, 14, 15, 16]. As seen
by these examples of research, both neural network                                        𝑛
and KNN can perform well in different conditions and                    𝑔𝑟𝑎𝑑 = 𝑓 ′ (𝑥) ⋅ ∑ 𝑤𝑖 ⋅ 𝑑𝑖           (6)
only by close comparison can we find out which one                                       𝑖=1
performs better in a given area.                            Here the sum of the weights is multiplied by the er-
                                                                ror gradient from i-th neuron, that is connected to the
2. Mathematical model                                           current neuron. Based on the equations above we can
                                                                simplify the }Δ𝑊 ′ to the following:
For this project we needed various functions to modify
the data as it passed between the neurons. The func-                                Δ𝑊 = 𝐿𝑅 ⋅ 𝑑 ⋅ 𝑠                   (7)
tion used to sum the inputs from the previous layer of
                                                              For the KNN we have used the Euklidean distance
neurons looked as follows:
                                                           as the measure of similarity. For this we have used the
  𝐼 𝑛𝑝𝑢𝑡 = 𝑋1 ⋅ 𝑊1 + 𝑋2 ⋅ 𝑊2 + 𝑋3 ⋅ 𝑊3 + ... + 𝑋𝑛 ⋅ 𝑊𝑛 (1) following equation:
                                                                              √
    The ‘X’ represents the output from a given neuron              𝐷(𝑋 , 𝑌 ) = (𝑋1 − 𝑌1 )2 + ... + (𝑋𝑛 − 𝑌𝑛 )2  (8)
and ‘W’ is the weight attached to the synapse. After
the input has been weighted and summed, it is modi-           Here ‘X’ represents the known data and ‘Y’ repre-
fied by the activation function. These functions act to sents the input. Based on the distances calculated the
modify the input, so that the output from the model algorithm, then calculates which class of objects ap-
can be interpreted and used as needed. The function, pears the most in the ‘k’ closest neighbors. The value
that has been used in this project is the sigmoid func- of ‘k’ was decided after experimenting on different val-
tion and it shown on the following graph. The graph ues.
shows the function that is used to model the data:
                                 1
                  𝑓 (𝑥) =                            (2)        3. Description of the proposed
                            1 + exp(−𝑥)
                                                                   system
  The derivative, which is used later for back propa-
gation,is as follows:                                  The deep learning process of the neural network is
                       1                  1            based on the idea of data being fed into the neurons
        𝑓 ′ (𝑥) =             ⋅ (1 −             ) (3) and modified, as it passes further along the different
                  1 + exp(−𝑥)        1 + exp(−𝑥)


                                                           17
layers. In order to achieve this we have to create a
model, that uses various functions to modify the data
appropriately. We start with the building block of the
neural network, which is a neuron. The neuron re-
ceives information via synapses from the neurons that
are connected to it.
   Each synapse has a weight attached to it, which is
initially generated randomly, when the synapse is first
created. The data, that comes into a neuron is weighted
using the previously described equation. This means
that the input depends on both the output of the pre- Figure 2: Epoch number test
vious neuron, as well as the weight of the synapse
that connects them. When all the inputs have been
summed, the data is then modified by the activation
                                                          set to determine their accuracy. This was calculated
function of the neuron. The function used in this project
                                                          based on how many times the model chose the correct
in the sigmoid function, for the reasons described pre-
                                                          digit, based on the information gained from the train-
viously. The neurons within the network are orga-
                                                          ing data. We have used this accuracy to compare the
nized through layers. The two layers,the first and the
                                                          two models in different scenarios.
last, are present in all neural networks. The middle
layers, known as hidden layers, are optional, but can
be added to improve the accuracy of the learning pro- 4. Experiments
cess of the network. The optimal number of the hidden
layers and the configuration of neurons within them For our experiments, we have decided to create an-
was decided on through testing of the network. For other set of training data, by augmenting the one we
the input data we have used bitmaps, sized 28x28 bits, already had. This was done by randomly modifying
that represent one the ten digits. For that we needed the pictures with one of four functions, which were ro-
enough neurons in the first layer for each individual tation by 90 degrees left or right, rotation by 90 degrees
bit of the picture. The number of neurons in the last horizontally(upside down) or adding ‘noise’, which was
layer represents the total number of classes that the in- just changing some of the pixels to black. This was
put can be classified as, which in this project was ten. done to see, whether there would be a significant dif-
The experiments we conducted to choose the number ference for the classifiers when dealing with random
of hidden layers will be described later in the article. nature of the modifications of the data. The final data
To modify the weights of the synapses we have used sets had 60000 images for non-augmented data and
the back propagation algorithm, described in the pre- 120000 for the augmented data(this one had both the
vious section.                                            augmented and non-augmented).
   The KNN classifier was much simpler to set up. In         For the first test, we deiced to test how the num-
this algorithm the objects are classified based on their ber of epochs would affect the accuracy of the neural
closeness to data, that the algorithm was trained on. network, when using augmented and non-augmented
When an input is presented, the Euclidean distance be- data. For this we have kept the learning rate at 0.3
tween it and all the available options is calculated. The and have added a singular hidden layer of 32 neurons.
distances are then sorted and the smallest are at the This setup was chosen to decrease the learning time
top of the list. The value of ‘k’ determines the number and to allow us to complete further tests. We started
of distances that are considered during the voting pro- the experiment at 1 epoch and increased it gradually
cess, where the algorithm checks which class appears until 10 epochs. For the augmented data we have only
most frequently within these chosen distances. This increased this number to 5, because of the increased
class is then given as the prediction for the input. This size of data. The results are presented on the diagram
means that only the value of ‘k’ has to be determined below. From the diagram we can see that the accu-
before setting up the classifier. This has been chosen racy varied between 93.5% and almost 96% for the non-
based on tests described later.                           augmented data.
   The data used for this project has been taken from        The highest accuracy was reached at 9 epochs, and
the MINST database. It has been sorted into a test and the biggest difference can be observed as we increased
a training set. For this project we have trained both the number from 4 to 5. Above 5 epochs the accuracy
classifiers using the training set and then used the test


                                                      18
Figure 3: Hidden layers test 1                                  Figure 4: Hidden layer test 2


doesn’t fall below 95% and it remains rather constant.          learning speeds. Even with a number as small as 10
The average accuracy for the non-augmented data is              we can see, that the accuracy is still 86.86%.
95.05% with a standard deviation of 0.0071%. When                  In the next test we have added another hidden layer.
it comes to the augmented data the accuracy is vis-             We have kept the same learning rate and number of
ibly decreased. Here is varied between 90% and 92%,             epochs as in previous test and set added 50 neurons to
so there was slightly lower variation, but this might be        the first hidden layer. The second one started with 10
due to lower number of epochs tested. The biggest dif-          neurons and we have increased this number gradually.
ference between the two sets can be seen at 5 epochs,           The results can be seen below. Here we can see the re-
where it reaches 4.52%. The best accuracy can be ob-            sults are less conclusive. The best accuracy of 93.49%
served at 3 epochs at 91.68%, and it seems to decrease          was achieved at 30 neurons. This value is smaller, than
past this number. The average accuracy for augmented            what was achieved with one hidden layer with 40 neu-
data was 90.92% with a standard deviation of 0.0059%.           rons. The worst accuracy of 92.96% was achieved at 20
The average difference between the two sets at each             neurons and this value is higher, than the worst accu-
number of epochs was calculated to be 3.69%. Based on           racy with one hidden layer. Adding another set of hid-
these values, we can conclude that the neural network           den layers leads to increased learning times, which has
was better at recognizing data without augmentation,            to be taken under consideration. From this it can be
but even with random modifications it was able to cor-          concluded, that adding another layer of neurons will
rectly guess the digit over 90% of the time. Across both        not necessarily lead to higher accuracies, but might be
sets the average accuracy was calculated to be 93.67%           a safer option. In our final setup we have decided to
with a standard deviation of 0.0212%. Another test was          keep a single hidden layer, mostly due to decreased
designed to check, how the number of hidden layers              learning time.
influenced the accuracy. In this test we kept the learn-           The next test was designed to check how chang-
ing rate at 0.3 and set the number of epochs to one.            ing the value of ‘k’ would influence the accuracy of
We started with 10 neurons in the hidden layer. The             the KNN classifier. This is the only parameter, that
results can be seen below. Here was can see, that as            we could modify, therefore it is the only test done for
we increased the number of neurons the accuracy in-             the KNN. Here we started with k=10 and increased the
creased. Here it ranged between 86.5% and went up               number. KNN took a very long time, when we used the
to almost 95% for 200 neurons. We did a test with 200           whole of test set for accuracy calculation, so we de-
neurons to check whether a bigger increase in num-              cided to only use a 100 elements from the test set. The
ber led to larger increase in accuracy. The downside            distances were calculated for the whole of the training
to increasing this number was higher learning times,            set. Results of the test can be seen below. As can be
although the increase was small.                                seen on the graph, we have decided to increase the val-
   The biggest increase can be seen between 10 and              ues drastically as gradual increase led to unsubstantial
20 neurons, where it reached 3.80%. Between 50 and              differences. From the graph we can see, that the accu-
200 neurons the increase was only 0.45%. From this              racy stays the same for low values of ‘k’, namely below
we can conclude, that the number of neurons in the              30 and then decreases as this number goes up. This de-
hidden layer can be kept quite small and this won’t             crease is more rapid as we reach values of 1000 and
adversely affect the accuracy and will lead to faster           more. KNN handled non-augmented data better, as


                                                           19
                                                                Figure 7: Table of results of the drawing test

Figure 5: K value test
                                                                mentation. In the table below we have recorded the
                                                                most prevalent answer for each case. From the table
                                                                we can see, that both classifiers were able to recognize
                                                                digits 0,2,3,5 and 8 quite well, but digits 7 and 9 posed a
                                                                difficulty. Neural network was able to correctly guess
                                                                the digit 80% with both data sets. KNN classifier was
                                                                able to guess the digit 70% of the time with both sets.
                                                                The differences were not big enough be safely con-
                                                                clude which classifier was better. Preferentially this
                                                                test would be carried out a 100 or more times, to give
                                                                more meaningful results.
Figure 6: Combined results of previous tests
                                                                5. Conclusions
was the case with the neural network, but here the dif-         After carrying out the project and analyzing the data,
ferences varied between 1-2%, and at k=1000 the accu-           it cannot be safely concluded, that one classifier would
racy for augmented data was higher at 86% compared              be better than the other one in the area of recogniz-
to 84%. The highest observed accuracy was higher                ing handwritten digits. Although the tests, that were
for KNN at 96.00% compared to 95.65% for the neural             concluded did show some differences in the accuracy,
network. From this experiment we can conclude that              especially when the data was augmented, not enough
lower values of ‘k’ lead to higher accuracy of the clas-        tests were performed to show, that the differences be-
sification process. This makes sense as higher values           tween the classifiers were significant. What can be
of ‘k’ means higher distance values are taken into con-         concluded from these tests is, that there is indeed a
sideration during the voting process. For comparison            difference in how these perform. KNN seemed to bet-
we collected the data from the experiments described            ter handle data, that was augmented, probably due to
previously and put the together on a single diagram             the fact that all the training data was available to it for
visible below. As can be seen on the graph, the accu-           comparison with the test data. Neural network on the
racy for KNN, at low values of ‘k’, is higher when com-         other hand increased in accuracy as we increased the
pared with the neural network. Increasing the number            number of epochs, as this allowed the weights on the
of epochs lead to higher accuracies, which cannot be            synapses to be changed, according to how many errors
said for increased values of k. The biggest differences         in judgment the network made. With the back prop-
can be observed with augmented data. KNN achieved               agation algorithm the network gets better each time
higher accuracy when compared with neural network,              we train it on the training data, where the KNN algo-
with the highest difference being 5%. At 3 epochs this          rithm only needs to be given the training data once for
difference is lower. Based on this data we can say that         it to work. Both classifiers show quite high accuracy,
KNN should, in theory be better at correctly predict-           when dealing with our data, but this would preferably
ing the answer. We have designed one last experiment            be increased to very close to a 100%, with correct con-
to test this. We have drawn each digit 20 times and             figuration. More tests would have to be done, mostly
tested each classifier with and without the use of aug-         to find the best possible setups for both these models


                                                           20
and to produce more statistically significant data. Fi-             vičius, R. Marcinkevičius, J. Kapočiūte-Dzikiené,
nally in the testing phase we noticed that there is a               C. Napoli,         Open class authorship attribu-
slight improvement in accuracy if the images are pre-               tion of lithuanian internet comments using one-
filtered [17].                                                      class classifier, in: 2017 Federated Conference
                                                                    on Computer Science and Information Systems
                                                                    (FedCSIS), IEEE, 2017, pp. 373–382.
References                                                     [12] A. Mishra, K. Kumar, P. Kumar, P. Mittal, A novel
                                                                    approach for handwritten character recognition
 [1] M. Woźniak, D. Połap, Intelligent home systems
                                                                    using k-nn classifier, in: Soft Computing: The-
     for ubiquitous user support by using neural net-
                                                                    ories and Applications, Springer, 2020, pp. 887–
     works and rule-based approach, IEEE Transac-
                                                                    894.
     tions on Industrial Informatics 16 (2019) 2651–
                                                               [13] C. Napoli, E. Tramontana, G. L. Sciuto, M. Woz-
     2658.
                                                                    niak, R. Damaevicius, G. Borowik, Author-
 [2] C. Napoli, F. Bonanno, G. Capizzi, An hybrid
                                                                    ship semantical identification using holomorphic
     neuro-wavelet approach for long-term predic-
                                                                    chebyshev projectors, in: 2015 Asia-Pacific Con-
     tion of solar wind, Proceedings of the Interna-
                                                                    ference on Computer Aided System Engineering,
     tional Astronomical Union 6 (2010) 153–155.
                                                                    IEEE, 2015, pp. 232–237.
 [3] C. Napoli, F. Bonanno, G. Capizzi, Exploiting
                                                               [14] C. Napoli, G. Pappalardo, E. Tramontana, An
     solar wind time series correlation with magne-
                                                                    agent-driven semantical identifier using radial
     tospheric response by using an hybrid neuro-
                                                                    basis neural networks and reinforcement learn-
     wavelet approach, Proceedings of the Interna-
                                                                    ing, arXiv preprint arXiv:1409.8484 (2014).
     tional Astronomical Union 6 (2010) 156–158.
                                                               [15] T. Jadhav, Handwritten signature verification us-
 [4] G. Capizzi, G. L. Sciuto, C. Napoli, M. Woźniak,
                                                                    ing local binary pattern features and knn, In-
     G. Susi, A spiking neural network-based long-
                                                                    ternational Research Journal of Engineering and
     term prediction system for biogas production,
                                                                    Technology (IRJET) 6 (2019) 579–586.
     Neural Networks 129 (2020) 271–279.
                                                               [16] M. Wróbel, J. T. Starczewski, C. Napoli, Hand-
 [5] G. Cardarilli, L. Di Nunzio, R. Fazzolari,
                                                                    writing recognition with extraction of letter frag-
     A. Nannarelli, M. Petricca, M. Re, Design space
                                                                    ments, in: International Conference on Artificial
     exploration based methodology for residue num-
                                                                    Intelligence and Soft Computing, Springer, 2017,
     ber system digital filters implementation, IEEE
                                                                    pp. 183–192.
     Transactions on Emerging Topics in Computing
                                                               [17] G. Capizzi, S. Coco, G. Sciuto, C. Napoli, A new it-
     (2020).
                                                                    erative fir filter design approach using a gaussian
 [6] M. Woźniak, D. Połap, Soft trees with neural
                                                                    approximation, IEEE Signal Processing Letters 25
     components as image-processing technique for
                                                                    (2018) 1615–1619.
     archeological excavations, Personal and Ubiqui-
     tous Computing 24 (2020) 363–375.
 [7] M. Husnain, S. Mumtaz, M. Coustaty, M. Luq-
     man, J.-M. Ogier, S. Malik, Urdu handwritten
     text recognition: A survey, IET Image Process-
     ing (2020).
 [8] N. D. Cilia, C. De Stefano, F. Fontanella, A. S.
     di Freca, A ranking-based feature selection ap-
     proach for handwritten character recognition,
     Pattern Recognition Letters 121 (2019) 77–86.
 [9] H.-h. Zhao, H. Liu, Multiple classifiers fusion
     and cnn feature extraction for handwritten dig-
     its recognition, Granular Computing (2019) 1–8.
[10] A. Islam, F. Rahman, A. S. A. Rabby, Sankhya:
     An unbiased benchmark for bangla handwritten
     digits recognition, in: 2019 IEEE International
     Conference on Big Data (Big Data), IEEE, 2019,
     pp. 4676–4683.
[11] A. Venckauskas, A. Karpavicius, R. Damaše-


                                                          21