Comparison of Neural Network and KNN classifiers, for recognizing hand-written digits Iwo Różyckia , Adam Wolszlegera a Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44-100 Gliwice Abstract This paper presents a comparison between two different types of classifiers, neural Network and KNN (k nearest neighbors) . The object is to decide which one better handles recognising numbers, from zero to ten, that have been handwritten. In this project we designed a program that analyses data from a database, and feeds this data to the classifiers, which in turn analyse the input from the user and make a prediction based on the information gained from the training data. The aim is to see whether there are any significant differences when it comes to the accuracy and speed of the learning processes. The comparison will be made based on the accuracy of the predictions with different configurations and parameters. Both these classifiers work in a different way and can be used efficiently for different purposes and this paper will present data that will help decide whether either of these is better suited in this particular area. Keywords Hand-written digits, Handwritten signature verification, Handwritten character recognition, Neural networks 1. Introduction is true. The idea behind it, is that it can calculate straight- Artificial neural networks are computing systems, whi- line distance, also known as Euclidean distance, be- ch are based on the anatomy of the human brain. The tween the input data and all that data that is available brain is made up of billion of neurons, which are inter- and which represents all the possible classes, that the connected by quadrillions of connections, called syn- input can be classified as. It therefore relies on this apses, which allow the information to flow between data set to be complete and would be unable to recog- the neurons. Currently, because of the huge network nize objects beyond what is already known to it. of connections, the human brain exceeds any super- There are many aproahes to classify text, by ranking computer in terms of processing power. Its structure evaluation [8], convolutional features extraction [9] or can be used to create artificial networks, which use devoted benchmark [10]. This project aims to compare artificially created neurons and synapses, which are these classifiers in the area of recognizing handwrit- able to process information and make predictions [1, ten digits. This is an important area, as recognition 2, 3] and guesses based on the knowledge gained [4, of handwriting is used in a variety of modern devices. 5]. This process is known as Deep Learning and it is We will test different configurations of parameters for widely used today in voice recognition, driver less cars both neural networks and KNN and hopefully see sig- and can be found in devices we use every day, such as nificant differences in performances in terms of accu- smart phones or TVs [6]. In the process of deep learn- racy and speed. ing, a computer model learns classification, directly from data, in the form of images, sound or text. A well constructed model can even exceed human ability. 1.1. Related works KNN, or k nearest neighbors is a much simple algo- These two classifiers have been compared in different rithm, that classifies new data based on a similarity articles and areas of research. In one of them, they measure, usually a distance function, to the already have been used to monitor the conditions of machines, stored available cases [7]. The assumptions made by by analyzing the vibration of journal-bearings. In this these algorithm ,is that similar things exist close to research another technique, which is more commonly each other and it is most useful when that assumption used in this area, was also used for comparison. At the end, the research has concluded that neural net- SYSTEM 2020: Symposium for Young Scientists in Technology, work performed better in this particular task. Another Engineering and Mathematics, Online, May 20 2020 research, that will be mentioned is one where these " iworozy581@student.polsl.pl (I. Różycki); models where compared at classifying magnetic reso- adamwol165@student.polsl.pl (A. Wolszleger)  nance images (MRI). This is a very important area of © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). research, as MRI is not always easy to interpret by a CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) As seen on the graph the function only exists be- tween 0 and 1 and it is used in models, that predict probability as their output. This makes it ideal for this project, as the aim is to produce a probability with which the input resembles one of the digits. For the back propagation algorithm we need to calculate the error of the neurons to correctly modify the weights of the synapses. For the last layer of neurons the fol- lowing equation was used: Δ𝑊 = 𝐿𝑅 ⋅ (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 − 𝑜𝑢𝑡𝑝𝑢𝑡) ⋅ 𝑓 ′ (𝑥) ⋅ 𝑠𝑗 (4) Here Δ𝑊 ′ represents the change in weight, ‘LR’ is Figure 1: Sigmoid function the learning rate, which is chosen for the network, 𝑓 ′ (𝑥) is the derivative of the activation function and 𝑠𝑗′ is the input from the j-th neuron from the previ- human, and using automated analysis would increase ous layer. We have defined a gradient of error for each the probability of correct diagnosis. A special tech- neuron to help us with the above equation. For the last nique was used on the images, to extract most impor- layer it is calculated as follows: tant features, which were then fed into a feed forward back propagation neural network and a KNN classi- 𝑔𝑟𝑎𝑑 = 𝑓 ′ (𝑥) ⋅ (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 − 𝑜𝑢𝑡𝑝𝑢𝑡) (5) fier. In this particular research KNN came out on top For the other layers: in terms of accuracy [11, 12, 13, 14, 15, 16]. As seen by these examples of research, both neural network 𝑛 and KNN can perform well in different conditions and 𝑔𝑟𝑎𝑑 = 𝑓 ′ (𝑥) ⋅ ∑ 𝑤𝑖 ⋅ 𝑑𝑖 (6) only by close comparison can we find out which one 𝑖=1 performs better in a given area. Here the sum of the weights is multiplied by the er- ror gradient from i-th neuron, that is connected to the 2. Mathematical model current neuron. Based on the equations above we can simplify the }Δ𝑊 ′ to the following: For this project we needed various functions to modify the data as it passed between the neurons. The func- Δ𝑊 = 𝐿𝑅 ⋅ 𝑑 ⋅ 𝑠 (7) tion used to sum the inputs from the previous layer of For the KNN we have used the Euklidean distance neurons looked as follows: as the measure of similarity. For this we have used the 𝐼 𝑛𝑝𝑢𝑡 = 𝑋1 ⋅ 𝑊1 + 𝑋2 ⋅ 𝑊2 + 𝑋3 ⋅ 𝑊3 + ... + 𝑋𝑛 ⋅ 𝑊𝑛 (1) following equation: √ The ‘X’ represents the output from a given neuron 𝐷(𝑋 , 𝑌 ) = (𝑋1 − 𝑌1 )2 + ... + (𝑋𝑛 − 𝑌𝑛 )2 (8) and ‘W’ is the weight attached to the synapse. After the input has been weighted and summed, it is modi- Here ‘X’ represents the known data and ‘Y’ repre- fied by the activation function. These functions act to sents the input. Based on the distances calculated the modify the input, so that the output from the model algorithm, then calculates which class of objects ap- can be interpreted and used as needed. The function, pears the most in the ‘k’ closest neighbors. The value that has been used in this project is the sigmoid func- of ‘k’ was decided after experimenting on different val- tion and it shown on the following graph. The graph ues. shows the function that is used to model the data: 1 𝑓 (𝑥) = (2) 3. Description of the proposed 1 + exp(−𝑥) system The derivative, which is used later for back propa- gation,is as follows: The deep learning process of the neural network is 1 1 based on the idea of data being fed into the neurons 𝑓 ′ (𝑥) = ⋅ (1 − ) (3) and modified, as it passes further along the different 1 + exp(−𝑥) 1 + exp(−𝑥) 17 layers. In order to achieve this we have to create a model, that uses various functions to modify the data appropriately. We start with the building block of the neural network, which is a neuron. The neuron re- ceives information via synapses from the neurons that are connected to it. Each synapse has a weight attached to it, which is initially generated randomly, when the synapse is first created. The data, that comes into a neuron is weighted using the previously described equation. This means that the input depends on both the output of the pre- Figure 2: Epoch number test vious neuron, as well as the weight of the synapse that connects them. When all the inputs have been summed, the data is then modified by the activation set to determine their accuracy. This was calculated function of the neuron. The function used in this project based on how many times the model chose the correct in the sigmoid function, for the reasons described pre- digit, based on the information gained from the train- viously. The neurons within the network are orga- ing data. We have used this accuracy to compare the nized through layers. The two layers,the first and the two models in different scenarios. last, are present in all neural networks. The middle layers, known as hidden layers, are optional, but can be added to improve the accuracy of the learning pro- 4. Experiments cess of the network. The optimal number of the hidden layers and the configuration of neurons within them For our experiments, we have decided to create an- was decided on through testing of the network. For other set of training data, by augmenting the one we the input data we have used bitmaps, sized 28x28 bits, already had. This was done by randomly modifying that represent one the ten digits. For that we needed the pictures with one of four functions, which were ro- enough neurons in the first layer for each individual tation by 90 degrees left or right, rotation by 90 degrees bit of the picture. The number of neurons in the last horizontally(upside down) or adding ‘noise’, which was layer represents the total number of classes that the in- just changing some of the pixels to black. This was put can be classified as, which in this project was ten. done to see, whether there would be a significant dif- The experiments we conducted to choose the number ference for the classifiers when dealing with random of hidden layers will be described later in the article. nature of the modifications of the data. The final data To modify the weights of the synapses we have used sets had 60000 images for non-augmented data and the back propagation algorithm, described in the pre- 120000 for the augmented data(this one had both the vious section. augmented and non-augmented). The KNN classifier was much simpler to set up. In For the first test, we deiced to test how the num- this algorithm the objects are classified based on their ber of epochs would affect the accuracy of the neural closeness to data, that the algorithm was trained on. network, when using augmented and non-augmented When an input is presented, the Euclidean distance be- data. For this we have kept the learning rate at 0.3 tween it and all the available options is calculated. The and have added a singular hidden layer of 32 neurons. distances are then sorted and the smallest are at the This setup was chosen to decrease the learning time top of the list. The value of ‘k’ determines the number and to allow us to complete further tests. We started of distances that are considered during the voting pro- the experiment at 1 epoch and increased it gradually cess, where the algorithm checks which class appears until 10 epochs. For the augmented data we have only most frequently within these chosen distances. This increased this number to 5, because of the increased class is then given as the prediction for the input. This size of data. The results are presented on the diagram means that only the value of ‘k’ has to be determined below. From the diagram we can see that the accu- before setting up the classifier. This has been chosen racy varied between 93.5% and almost 96% for the non- based on tests described later. augmented data. The data used for this project has been taken from The highest accuracy was reached at 9 epochs, and the MINST database. It has been sorted into a test and the biggest difference can be observed as we increased a training set. For this project we have trained both the number from 4 to 5. Above 5 epochs the accuracy classifiers using the training set and then used the test 18 Figure 3: Hidden layers test 1 Figure 4: Hidden layer test 2 doesn’t fall below 95% and it remains rather constant. learning speeds. Even with a number as small as 10 The average accuracy for the non-augmented data is we can see, that the accuracy is still 86.86%. 95.05% with a standard deviation of 0.0071%. When In the next test we have added another hidden layer. it comes to the augmented data the accuracy is vis- We have kept the same learning rate and number of ibly decreased. Here is varied between 90% and 92%, epochs as in previous test and set added 50 neurons to so there was slightly lower variation, but this might be the first hidden layer. The second one started with 10 due to lower number of epochs tested. The biggest dif- neurons and we have increased this number gradually. ference between the two sets can be seen at 5 epochs, The results can be seen below. Here we can see the re- where it reaches 4.52%. The best accuracy can be ob- sults are less conclusive. The best accuracy of 93.49% served at 3 epochs at 91.68%, and it seems to decrease was achieved at 30 neurons. This value is smaller, than past this number. The average accuracy for augmented what was achieved with one hidden layer with 40 neu- data was 90.92% with a standard deviation of 0.0059%. rons. The worst accuracy of 92.96% was achieved at 20 The average difference between the two sets at each neurons and this value is higher, than the worst accu- number of epochs was calculated to be 3.69%. Based on racy with one hidden layer. Adding another set of hid- these values, we can conclude that the neural network den layers leads to increased learning times, which has was better at recognizing data without augmentation, to be taken under consideration. From this it can be but even with random modifications it was able to cor- concluded, that adding another layer of neurons will rectly guess the digit over 90% of the time. Across both not necessarily lead to higher accuracies, but might be sets the average accuracy was calculated to be 93.67% a safer option. In our final setup we have decided to with a standard deviation of 0.0212%. Another test was keep a single hidden layer, mostly due to decreased designed to check, how the number of hidden layers learning time. influenced the accuracy. In this test we kept the learn- The next test was designed to check how chang- ing rate at 0.3 and set the number of epochs to one. ing the value of ‘k’ would influence the accuracy of We started with 10 neurons in the hidden layer. The the KNN classifier. This is the only parameter, that results can be seen below. Here was can see, that as we could modify, therefore it is the only test done for we increased the number of neurons the accuracy in- the KNN. Here we started with k=10 and increased the creased. Here it ranged between 86.5% and went up number. KNN took a very long time, when we used the to almost 95% for 200 neurons. We did a test with 200 whole of test set for accuracy calculation, so we de- neurons to check whether a bigger increase in num- cided to only use a 100 elements from the test set. The ber led to larger increase in accuracy. The downside distances were calculated for the whole of the training to increasing this number was higher learning times, set. Results of the test can be seen below. As can be although the increase was small. seen on the graph, we have decided to increase the val- The biggest increase can be seen between 10 and ues drastically as gradual increase led to unsubstantial 20 neurons, where it reached 3.80%. Between 50 and differences. From the graph we can see, that the accu- 200 neurons the increase was only 0.45%. From this racy stays the same for low values of ‘k’, namely below we can conclude, that the number of neurons in the 30 and then decreases as this number goes up. This de- hidden layer can be kept quite small and this won’t crease is more rapid as we reach values of 1000 and adversely affect the accuracy and will lead to faster more. KNN handled non-augmented data better, as 19 Figure 7: Table of results of the drawing test Figure 5: K value test mentation. In the table below we have recorded the most prevalent answer for each case. From the table we can see, that both classifiers were able to recognize digits 0,2,3,5 and 8 quite well, but digits 7 and 9 posed a difficulty. Neural network was able to correctly guess the digit 80% with both data sets. KNN classifier was able to guess the digit 70% of the time with both sets. The differences were not big enough be safely con- clude which classifier was better. Preferentially this test would be carried out a 100 or more times, to give more meaningful results. Figure 6: Combined results of previous tests 5. Conclusions was the case with the neural network, but here the dif- After carrying out the project and analyzing the data, ferences varied between 1-2%, and at k=1000 the accu- it cannot be safely concluded, that one classifier would racy for augmented data was higher at 86% compared be better than the other one in the area of recogniz- to 84%. The highest observed accuracy was higher ing handwritten digits. Although the tests, that were for KNN at 96.00% compared to 95.65% for the neural concluded did show some differences in the accuracy, network. From this experiment we can conclude that especially when the data was augmented, not enough lower values of ‘k’ lead to higher accuracy of the clas- tests were performed to show, that the differences be- sification process. This makes sense as higher values tween the classifiers were significant. What can be of ‘k’ means higher distance values are taken into con- concluded from these tests is, that there is indeed a sideration during the voting process. For comparison difference in how these perform. KNN seemed to bet- we collected the data from the experiments described ter handle data, that was augmented, probably due to previously and put the together on a single diagram the fact that all the training data was available to it for visible below. As can be seen on the graph, the accu- comparison with the test data. Neural network on the racy for KNN, at low values of ‘k’, is higher when com- other hand increased in accuracy as we increased the pared with the neural network. Increasing the number number of epochs, as this allowed the weights on the of epochs lead to higher accuracies, which cannot be synapses to be changed, according to how many errors said for increased values of k. The biggest differences in judgment the network made. With the back prop- can be observed with augmented data. KNN achieved agation algorithm the network gets better each time higher accuracy when compared with neural network, we train it on the training data, where the KNN algo- with the highest difference being 5%. At 3 epochs this rithm only needs to be given the training data once for difference is lower. Based on this data we can say that it to work. Both classifiers show quite high accuracy, KNN should, in theory be better at correctly predict- when dealing with our data, but this would preferably ing the answer. We have designed one last experiment be increased to very close to a 100%, with correct con- to test this. We have drawn each digit 20 times and figuration. More tests would have to be done, mostly tested each classifier with and without the use of aug- to find the best possible setups for both these models 20 and to produce more statistically significant data. Fi- vičius, R. Marcinkevičius, J. Kapočiūte-Dzikiené, nally in the testing phase we noticed that there is a C. Napoli, Open class authorship attribu- slight improvement in accuracy if the images are pre- tion of lithuanian internet comments using one- filtered [17]. class classifier, in: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), IEEE, 2017, pp. 373–382. References [12] A. Mishra, K. Kumar, P. Kumar, P. Mittal, A novel approach for handwritten character recognition [1] M. Woźniak, D. Połap, Intelligent home systems using k-nn classifier, in: Soft Computing: The- for ubiquitous user support by using neural net- ories and Applications, Springer, 2020, pp. 887– works and rule-based approach, IEEE Transac- 894. tions on Industrial Informatics 16 (2019) 2651– [13] C. Napoli, E. Tramontana, G. L. Sciuto, M. Woz- 2658. niak, R. Damaevicius, G. Borowik, Author- [2] C. Napoli, F. Bonanno, G. Capizzi, An hybrid ship semantical identification using holomorphic neuro-wavelet approach for long-term predic- chebyshev projectors, in: 2015 Asia-Pacific Con- tion of solar wind, Proceedings of the Interna- ference on Computer Aided System Engineering, tional Astronomical Union 6 (2010) 153–155. IEEE, 2015, pp. 232–237. [3] C. Napoli, F. Bonanno, G. Capizzi, Exploiting [14] C. Napoli, G. Pappalardo, E. Tramontana, An solar wind time series correlation with magne- agent-driven semantical identifier using radial tospheric response by using an hybrid neuro- basis neural networks and reinforcement learn- wavelet approach, Proceedings of the Interna- ing, arXiv preprint arXiv:1409.8484 (2014). tional Astronomical Union 6 (2010) 156–158. [15] T. Jadhav, Handwritten signature verification us- [4] G. Capizzi, G. L. Sciuto, C. Napoli, M. Woźniak, ing local binary pattern features and knn, In- G. Susi, A spiking neural network-based long- ternational Research Journal of Engineering and term prediction system for biogas production, Technology (IRJET) 6 (2019) 579–586. Neural Networks 129 (2020) 271–279. [16] M. Wróbel, J. T. Starczewski, C. Napoli, Hand- [5] G. Cardarilli, L. Di Nunzio, R. Fazzolari, writing recognition with extraction of letter frag- A. Nannarelli, M. Petricca, M. Re, Design space ments, in: International Conference on Artificial exploration based methodology for residue num- Intelligence and Soft Computing, Springer, 2017, ber system digital filters implementation, IEEE pp. 183–192. Transactions on Emerging Topics in Computing [17] G. Capizzi, S. Coco, G. Sciuto, C. Napoli, A new it- (2020). erative fir filter design approach using a gaussian [6] M. Woźniak, D. Połap, Soft trees with neural approximation, IEEE Signal Processing Letters 25 components as image-processing technique for (2018) 1615–1619. archeological excavations, Personal and Ubiqui- tous Computing 24 (2020) 363–375. [7] M. Husnain, S. Mumtaz, M. Coustaty, M. Luq- man, J.-M. Ogier, S. Malik, Urdu handwritten text recognition: A survey, IET Image Process- ing (2020). [8] N. D. Cilia, C. De Stefano, F. Fontanella, A. S. di Freca, A ranking-based feature selection ap- proach for handwritten character recognition, Pattern Recognition Letters 121 (2019) 77–86. [9] H.-h. Zhao, H. Liu, Multiple classifiers fusion and cnn feature extraction for handwritten dig- its recognition, Granular Computing (2019) 1–8. [10] A. Islam, F. Rahman, A. S. A. Rabby, Sankhya: An unbiased benchmark for bangla handwritten digits recognition, in: 2019 IEEE International Conference on Big Data (Big Data), IEEE, 2019, pp. 4676–4683. [11] A. Venckauskas, A. Karpavicius, R. Damaše- 21