Simply Pattern Recognition as a Tool for Identity Verification Karolina K˛esik Institute of Mathematics Silesian University of Technology Kaszubska 23, 44-100 Gliwice, Poland karola.ksk@gmail.com Abstract—The increasing development of mobile and smart analyzes was shown. Again in [11], [12] extraction technique technologies caused that voice recognition and even analysis is for a specific parts of signals was shown. The method was becoming much more needed in recent years. For this reason, in tested on some popular voice distortion like cough. It is useful this paper, the idea of voice recognition is presented. Proposed idea is based on classic approach called pattern matching. What in authorization systems when a record is created and for distinguishes the technique is to present a sound sample in the verification process, only first/last name is required. form of a spectrogram (2D image). Then, the features extraction Of course, these systems to operate data obtained from many is done not on the sound, but on the image, what allows first sensors need some algorithm to gather all these information to build the pattern, and then the classification. In addition, the matching process is supported by the k-nearest neighbors and process them. If the system works in real time, a lot technique. The entire process has been described, tested and of data will come in every second. And this means that discussed. the software will not be able to process all at the same time, hence the idea is to use parallelization or give certain I. I NTRODUCTION weights to incoming data. Queuing service is a stochastic The Internet of Things is a concept that has become the model according to which it can direct the data handling driving force behind technological action. The increasing need from the sensors. An example of such a model is shown in to simplify life, as well its improvement, is mobilized not [18]. Large amounts of data need fast sorting algorithms not only by companies offering various types of equipment, or only for sorting, but for searching a specific information in software with the smart note, but also by researchers. They database, where all incoming data are stored. One of the latest focus on developing particular aspects that are components of achievement in these area are algorithms which are merged any technique that is later assimilated by the industry, and with multi-threaded processor [8], [9]. Many of new methods hence distributed to our homes. The most important topic are based on artificial intelligence like neural networks or in this topic were widely described in [6], [10], [19]. The swarm intelligence [7], [20], [22]. All of these mentioned authors focused on emerging issues that are necessary from components are necessary in large systems, but it also need the industrial point of view. security against uncontrolled access to data or computational It is hard to tell which components in large systems are processes. Important work in these area is presented in [14], important. Therefore, all are treated equally and developed at a where almost all aspects and challenges in internet of things similar level. Each software is installed on specific devices and are described and discussed. is a link between the user and hardware. In the case of systems In this paper, the idea of identity verification process is under the sign smart, various sensors are used to acquire described with background about interpreting audio signal to knowledge about the environment. An example is a motion a form that allows analysis. sensor or a camera that records an image and then serves to find some deviation from the norm or the appearance of some movement. One of video processing idea was presented in II. S IGNAL THEORY [17], where the authors described video tamper detection by the application of multi-scale mutual data. Another sensors The processed signal should be given in a discrete form. are microphones that record the sound and voice. Sound Especially when the operations are performed by a com- recording devices allow to receive voice commands that will puter. In practice, having an analog signal should be changed be important especially for people with disabilities. At first, to a discrete equivalent. Unfortunately, even such a ver- the analog signal must be converted to discrete one (because of sion is practically not useful. For this purpose, the signal processing by computers), so processing of the signal is critical must undergo a certain transformation, which will transform issue. In [2], [13], [16], the idea of using discrete and wavelet it to the form possible in the analysis. One of the most transformation to obtain audio signal in the form ready to known transformation is Fourier’s one. Suppose that s(n) = (s0 , s1 , s2 , . . . , sN −1 ) is a signal. Transformation of such a set Copyright held by the author(s). will give (S0 , S1 , S2 , . . . , SN −1 ), where Si ∈ C and it is done 19 by N −1   X 2πink Sk = sn exp − 0 ≤ k ≤ N − 1. (1) n=0 N While the discrete Fourier transform allowed for calculations on various machines, the operation time was still too long. In 1960s, two American scientists – James W. Cooley and John W. Tukey presented Fast Fourier Transform [3], [4], which is a technique of calculation transformation using recursion and division and rule method . Whole idea is based on the division of functions into even and odd indices in the following way N −1   X 2iπnk Sk = sn exp − n=0 N N/2−1   X 2iπk(2m) = s2m exp − m=0 N N/2−1   X 2iπk(2m + 1) + s2m+1 exp − (2) m=0 N N/2−1   X 2iπkm = s2m exp − m=0 N/2 Figure 1: Spectrograms for three different samples belonging   N/2−1   to one person pronouncing "James Tiberius Kirk". 2iπk X 2iπkm + exp − s2m+1 exp − , N m=0 N/2 It is possible to analyze the sound in graphic form, but for this lowest saturation. These values allows to define the µ which purpose the signal should be saved in the form of a so-called is a threshold value defined as  short-time transform as 1.2 · zmax if µ 6= 0 ∞ µ= . (5) X 0.2 if µ = 0 S{s[n]}(m, f ) ≡ S(m, f ) = s[n]w[n−m] exp(−jf n). n=−∞ Using µ value, it is possible to create a vector of the most (3) characteristic points (with all points satisfied condition in Eq. Using above equation, the signal can be presented as a graph (5)) in the following form of the amplitude spectrum, which is determined as ξ k = xk0 , y0k , z0k , xk1 , y1k , z1k , . . . , xkm , ym k k     , zm , (6) spectrogram{s(t)}(t, f ) ≡ |S(t, f )|2 . (4) Presenting the calculated values from Eq. 4 on 2D graph, we where k is the number of a specific record, xk0 and y0k are the have points and their values. There are two axes – OX which point with the lowest saturation equal to z0k . means time and OY representing the frequency. The value of In this way, k sets will be created. All values are grouped by a given point is represented by the shade of color which is the k-nearest neighbors classifier to remove points at a short understood as a intensity. Sample graphs are shown in Fig. 1. distance in each sets. A probability estimator is defined as n III. PATTERN RECOGNITION 1 X p̂(k|x) = I(ρ(x, xi ) ≤ ρ(x, x(k) ))I(yi = k) Let us consider spectrogram as a set of point (x, y) with K i=1 (7) intensity in the range h0, 1i. On the spectrogram, the most k = 1, . . . , L, important features will have the brightest shade, so the inten- sity value will have the smallest values. where ρ(·) is metric, x(k) is k-th as to the distance to the At the beginning, let us focus on pattern creation process. point from the samples x. And using these, the classifier is The newly hired employee is asked to repeat his/her name formulated as at least 10 times. Each repetition is one recording. Then, 10 spectrograms are taken and used to create pattern based dˆKN N (x) = arg max p̂(k|x). (8) k on these recordings. In ideal world, all samples should look similar. However, in practice it is not so easy because there After analyzing the points, there is a possible that sets will can be worst quality of records, some noises and many other be have different numbers of elements. To fix it, sets will be factors. For each sample, we find the value zmax with the pruned to the number describing the smallest set. Then, all 20 sets ξ k will create intervals for points in the pattern. Limits Table I: Obtained solutions for voice recognition of these intervals are determined as TP TN FP FN Γ Λ Ψ Υ Ψ min{xk0 }, max{xk0 } , min{y0k }, max{y0k } , 27 3 6 24 0.85 0.64 0.47 0.53 0.33      min{z0k }, max{z0k } , . . . , min{xkm }, max{xkm } ,      k k   k k  (9) min{ym }, max{ym } , min{zm }, max{zm } , k ∈ {1, 2, . . . , 10} The distribution of correctly and incorrectly classified sam- ples is presented in Fig. 2, 3, 4, 5, 6. In the case of the owner, IV. E XPERIMENTS only 10% of correct samples were incorrectly classified. As the reason, some noise or recording time can be the issue. For samples made by three different counterfeiters, the average rate of fraud detection was 80% which is a good result considering the number of samples. A more detailed analysis of the measurements is shown in Tab. I, where the average effectiveness is 85%. Similarity coefficient reached 0.64 which is quite high value. However, it is worth noting that the obtained data should be contained within a fairly wide error range. Similarly with the other coefficients – the probability of obtaining a negative classification assuming that the sample is true is 0.33, and the probability of positive verification for fraud is 0.53. The obtained results indicate a high degree of effectiveness despite the number of sound samples as well as the extraction and classification technique itself. V. C ONCLUSIONS Figure 2: A graphical summary of the obtained results for all In this paper, the idea of audio analysis based on the samples in the test database. mechanism of pattern matching with k-nearest neighbors was presented. It is important to develop more different techniques Proposed method was tested on a small dataset consisting for security due to the reduction in the number of calculations, only 60 samples, from which half of them contained the three simplifying the operation as well as increasing the precision of words "James Tiberius Kirk" made by one person who is actions. This technique was implemented and tested on a small identified with this data (so called owner). The remaining dataset consisting only 60 samples. Half of them belonged to 30 samples were created by three different people (so called the one person (called as owner), and the rest of them to three counterfeiters), each of them has created 10 samples. other people which were a forgery and used for verification Using only 10 samples from the owner (selected randomly), purposes. Due to the noise and different recording times, the pattern was modeled. Then, all samples in the database were program incorrectly classified 10% of true records. However, it checked for pattern match. If the compatibility was at least does not change the fact that the effectiveness of the proposed 80%, then it was marked as owner. Otherwise, the sample idea reached almost 85%. It is worth noting that it was tested was marked as falsification. for k = 4, and increasing the number of neighbors resulted in The verification of the effectiveness of the proposed tech- a decrease in the correctness of classification, which may be nique was examined by grouping the samples as T P (true due to the number of samples. positive), T N (true negative), F P (false positive), F N (false An important aspect of further research is increasing the negative). For such divided results, accuracy was calculated database with a much larger number of recordings, increasing as Γ, Dice’s coefficient as Λ, overlap Ψ, sensitivity Υ and noises or problems with the voice of the recording person. specificity Φ according to It is particularly important to be able to bypass hoarseness TP + FN or remove the cough. In the case of accuracy, the use of Γ= , (10) other, more complicated classification (like neural networks) TP + TN + FP + FN methods may prove to be a much more favorable approach. 2T P Λ= , (11) 2T P + F P + F N R EFERENCES TP [1] F. Beritelli, G. Capizzi, G. L. Sciuto, C. Napoli, and F. Scaglione. Ψ= , (12) Automatic heart activity diagnosis based on gram polynomials and TP + FP + FN probabilistic neural networks. Biomedical Engineering Letters, 8(1):77– TP 85, 2018. Υ= , (13) [2] D. Birvinskas, V. Jusas, I. Martisius, and R. Damasevicius. Eeg dataset TP + FN reduction and feature extraction using discrete cosine transform. In TN Computer Modeling and Simulation (EMS), 2012 Sixth UKSim/AMSS Φ= . (14) European Symposium on, pages 199–204. IEEE, 2012. TN + FP 21 Figure 3: Confusion matrix for the owner’s sample classifica- Figure 5: Confusion matrix for the second counterfeiter’s tion. sample classification. Figure 4: Confusion matrix for the first counterfeiter’s sample Figure 6: Confusion matrix for the third counterfeiter’s sample classification. classification. [3] W. T. Cochran, J. W. Cooley, D. L. Favin, H. D. Helms, R. A. [6] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami. Internet of things Kaenel, W. W. Lang, G. Maling, D. E. Nelson, C. M. Rader, and P. D. (iot): A vision, architectural elements, and future directions. Future Welch. What is the fast fourier transform? Proceedings of the IEEE, generation computer systems, 29(7):1645–1660, 2013. 55(10):1664–1674, 1967. [7] P. N. Mahalle, P. A. Thakre, N. R. Prasad, and R. Prasad. A fuzzy [4] J. W. Cooley, P. A. Lewis, and P. D. Welch. The fast fourier transform approach to trust based access control in internet of things. In and its applications. IEEE Transactions on Education, 12(1):27–34, Wireless Communications, Vehicular Technology, Information Theory 1969. and Aerospace & Electronic Systems (VITAE), 2013 3rd International [5] R. Damaševičius, C. Napoli, T. Sidekerskienė, and M. Woźniak. Imf Conference on, pages 1–5. IEEE, 2013. mode demixing in emd for jitter analysis. Journal of Computational [8] Z. Marszałek. Parallelization of fast sort algorithm. In International Science, 22:240–252, 2017. Conference on Information and Software Technologies, pages 408–421. 22 Springer, 2017. [9] Z. Marszałek. Parallelization of modified merge sort algorithm. Sym- metry, 9(9):176, 2017. [10] C. Perera, C. H. Liu, and S. Jayawardena. The emerging internet of things marketplace from an industrial perspective: A survey. IEEE Transactions on Emerging Topics in Computing, 3(4):585–598, 2015. [11] D. Polap. Extraction of specific data from a sound sample by removing additional distortion. In Computer Science and Information Systems (FedCSIS), 2017 Federated Conference on, pages 353–356. IEEE, 2017. [12] D. Połap and M. Woźniak. Extraction and analysis of voice samples based on short audio files. In International Conference on Information and Software Technologies, pages 422–431. Springer, 2017. [13] N. Romano, A. Scivoletto, and D. Polap. A real-time audio compression technique based on fast wavelet filtering and encoding. In Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on, pages 497–502. IEEE, 2016. [14] S. Sicari, A. Rizzardi, L. A. Grieco, and A. Coen-Porisini. Security, privacy and trust in internet of things: The road ahead. Computer Networks, 76:146–164, 2015. [15] J. T. Starczewski, S. Pabiasz, N. Vladymyrska, A. Marvuglia, C. Napoli, and M. Woźniak. Self organizing maps for 3d face understanding. In International Conference on Artificial Intelligence and Soft Computing, pages 210–217. Springer, 2016. [16] M. Vasiljevas, R. Turčinas, and R. Damaševičius. Development of emg- based speller. In Proceedings of the XV International Conference on Human Computer Interaction, page 7. ACM, 2014. [17] W. Wei, X. Fan, H. Song, and H. Wang. Video tamper detection based on multi-scale mutual information. Multimedia Tools and Applications, pages 1–18, 2017. [18] W. Wei, Q. Xu, L. Wang, X. Hei, P. Shen, W. Shi, and L. Shan. Gi/geom/1 queue based on communication model for mesh networks. International Journal of Communication Systems, 27(11):3013–3029, 2014. [19] A. Whitmore, A. Agarwal, and L. Da Xu. The internet of things—a survey of topics and trends. Information Systems Frontiers, 17(2):261– 274, 2015. [20] M. Woźniak and D. Połap. Adaptive neuro-heuristic hybrid model for fruit peel defects detection. Neural Networks, 98:16–33, 2018. [21] M. Wozniak, D. Polap, G. Borowik, and C. Napoli. A first attempt to cloud-based user verification in distributed system. In Asia-Pacific Conference on Computer Aided System Engineering (APCASE), pages 226–231. IEEE, 2015. [22] M. Woźniak, D. Połap, L. Kośmider, and T. Cłapa. Automated fluorescence microscopy image analysis of pseudomonas aeruginosa bacteria in alive and dead stadium. Engineering Applications of Artificial Intelligence, 67:100–110, 2018. 23