Simply Pattern Recognition
                           as a Tool for Identity Verification
                                                         Karolina K˛esik
                                                    Institute of Mathematics
                                               Silesian University of Technology
                                             Kaszubska 23, 44-100 Gliwice, Poland
                                                     karola.ksk@gmail.com


   Abstract—The increasing development of mobile and smart               analyzes was shown. Again in [11], [12] extraction technique
technologies caused that voice recognition and even analysis is          for a specific parts of signals was shown. The method was
becoming much more needed in recent years. For this reason, in           tested on some popular voice distortion like cough. It is useful
this paper, the idea of voice recognition is presented. Proposed
idea is based on classic approach called pattern matching. What          in authorization systems when a record is created and for
distinguishes the technique is to present a sound sample in the          verification process, only first/last name is required.
form of a spectrogram (2D image). Then, the features extraction             Of course, these systems to operate data obtained from many
is done not on the sound, but on the image, what allows first            sensors need some algorithm to gather all these information
to build the pattern, and then the classification. In addition,
the matching process is supported by the k-nearest neighbors             and process them. If the system works in real time, a lot
technique. The entire process has been described, tested and             of data will come in every second. And this means that
discussed.                                                               the software will not be able to process all at the same
                                                                         time, hence the idea is to use parallelization or give certain
                         I. I NTRODUCTION                                weights to incoming data. Queuing service is a stochastic
   The Internet of Things is a concept that has become the               model according to which it can direct the data handling
driving force behind technological action. The increasing need           from the sensors. An example of such a model is shown in
to simplify life, as well its improvement, is mobilized not              [18]. Large amounts of data need fast sorting algorithms not
only by companies offering various types of equipment, or                only for sorting, but for searching a specific information in
software with the smart note, but also by researchers. They              database, where all incoming data are stored. One of the latest
focus on developing particular aspects that are components of            achievement in these area are algorithms which are merged
any technique that is later assimilated by the industry, and             with multi-threaded processor [8], [9]. Many of new methods
hence distributed to our homes. The most important topic                 are based on artificial intelligence like neural networks or
in this topic were widely described in [6], [10], [19]. The              swarm intelligence [7], [20], [22]. All of these mentioned
authors focused on emerging issues that are necessary from               components are necessary in large systems, but it also need
the industrial point of view.                                            security against uncontrolled access to data or computational
   It is hard to tell which components in large systems are              processes. Important work in these area is presented in [14],
important. Therefore, all are treated equally and developed at a         where almost all aspects and challenges in internet of things
similar level. Each software is installed on specific devices and        are described and discussed.
is a link between the user and hardware. In the case of systems             In this paper, the idea of identity verification process is
under the sign smart, various sensors are used to acquire                described with background about interpreting audio signal to
knowledge about the environment. An example is a motion                  a form that allows analysis.
sensor or a camera that records an image and then serves to
find some deviation from the norm or the appearance of some
movement. One of video processing idea was presented in                                          II. S IGNAL THEORY
[17], where the authors described video tamper detection by
the application of multi-scale mutual data. Another sensors                 The processed signal should be given in a discrete form.
are microphones that record the sound and voice. Sound                   Especially when the operations are performed by a com-
recording devices allow to receive voice commands that will              puter. In practice, having an analog signal should be changed
be important especially for people with disabilities. At first,          to a discrete equivalent. Unfortunately, even such a ver-
the analog signal must be converted to discrete one (because of          sion is practically not useful. For this purpose, the signal
processing by computers), so processing of the signal is critical        must undergo a certain transformation, which will transform
issue. In [2], [13], [16], the idea of using discrete and wavelet        it to the form possible in the analysis. One of the most
transformation to obtain audio signal in the form ready to               known transformation is Fourier’s one. Suppose that s(n) =
                                                                         (s0 , s1 , s2 , . . . , sN −1 ) is a signal. Transformation of such a set
  Copyright held by the author(s).                                       will give (S0 , S1 , S2 , . . . , SN −1 ), where Si ∈ C and it is done


                                                                    19
by
            N −1                   
            X           2πink
      Sk =     sn exp −                  0 ≤ k ≤ N − 1.      (1)
           n=0
                          N
While the discrete Fourier transform allowed for calculations
on various machines, the operation time was still too long. In
1960s, two American scientists – James W. Cooley and John
W. Tukey presented Fast Fourier Transform [3], [4], which is
a technique of calculation transformation using recursion and
division and rule method . Whole idea is based on the division
of functions into even and odd indices in the following way
         N −1                  
          X              2iπnk
   Sk =       sn exp −
          n=0
                           N
         N/2−1                       
           X                 2iπk(2m)
       =        s2m exp −
           m=0
                                 N
         N/2−1                              
           X                   2iπk(2m + 1)
       +       s2m+1 exp −                                 (2)
          m=0
                                    N
         N/2−1                    
           X                 2iπkm
       =        s2m exp −
           m=0
                              N/2                                        Figure 1: Spectrograms for three different samples belonging
                      N/2−1                                          to one person pronouncing "James Tiberius Kirk".
                 2iπk     X                   2iπkm
       + exp −                 s2m+1 exp −              ,
                   N     m=0
                                               N/2
It is possible to analyze the sound in graphic form, but for this        lowest saturation. These values allows to define the µ which
purpose the signal should be saved in the form of a so-called            is a threshold value defined as
                                                                                             
short-time transform as                                                                         1.2 · zmax if µ 6= 0
                                 ∞                                                      µ=                             .          (5)
                                 X                                                              0.2          if µ = 0
S{s[n]}(m, f ) ≡ S(m, f ) =           s[n]w[n−m] exp(−jf n).
                               n=−∞                                      Using µ value, it is possible to create a vector of the most
                                                          (3)            characteristic points (with all points satisfied condition in Eq.
Using above equation, the signal can be presented as a graph             (5)) in the following form
of the amplitude spectrum, which is determined as
                                                                           ξ k = xk0 , y0k , z0k , xk1 , y1k , z1k , . . . , xkm , ym
                                                                                                                                    k    k
                                                                                                                                        
                                                                                                                                      , zm   , (6)
            spectrogram{s(t)}(t, f ) ≡ |S(t, f )|2 .         (4)
Presenting the calculated values from Eq. 4 on 2D graph, we              where k is the number of a specific record, xk0 and y0k are the
have points and their values. There are two axes – OX which              point with the lowest saturation equal to z0k .
means time and OY representing the frequency. The value of                  In this way, k sets will be created. All values are grouped by
a given point is represented by the shade of color which is              the k-nearest neighbors classifier to remove points at a short
understood as a intensity. Sample graphs are shown in Fig. 1.            distance in each sets. A probability estimator is defined as
                                                                                              n
                III. PATTERN RECOGNITION                                                 1 X
                                                                             p̂(k|x) =         I(ρ(x, xi ) ≤ ρ(x, x(k) ))I(yi = k)
   Let us consider spectrogram as a set of point (x, y) with                             K i=1                                                (7)
intensity in the range h0, 1i. On the spectrogram, the most                                                              k = 1, . . . , L,
important features will have the brightest shade, so the inten-
sity value will have the smallest values.                                where ρ(·) is metric, x(k) is k-th as to the distance to the
   At the beginning, let us focus on pattern creation process.           point from the samples x. And using these, the classifier is
The newly hired employee is asked to repeat his/her name                 formulated as
at least 10 times. Each repetition is one recording. Then,
10 spectrograms are taken and used to create pattern based                                  dˆKN N (x) = arg max p̂(k|x).                     (8)
                                                                                                                  k
on these recordings. In ideal world, all samples should look
similar. However, in practice it is not so easy because there              After analyzing the points, there is a possible that sets will
can be worst quality of records, some noises and many other              be have different numbers of elements. To fix it, sets will be
factors. For each sample, we find the value zmax with the                pruned to the number describing the smallest set. Then, all


                                                                    20
sets ξ k will create intervals for points in the pattern. Limits               Table I: Obtained solutions for voice recognition
of these intervals are determined as                                          TP    TN     FP     FN      Γ       Λ       Ψ       Υ       Ψ
      min{xk0 }, max{xk0 } , min{y0k }, max{y0k } ,                           27     3      6     24     0.85    0.64    0.47    0.53    0.33
                                                

    min{z0k }, max{z0k } , . . . , min{xkm }, max{xkm } ,
                                                     
          k              k
                                   k          k
                                                           (9)
     min{ym  }, max{ym      } , min{zm }, max{zm  }    ,
        k ∈ {1, 2, . . . , 10}                                             The distribution of correctly and incorrectly classified sam-
                                                                        ples is presented in Fig. 2, 3, 4, 5, 6. In the case of the owner,
                     IV. E XPERIMENTS                                   only 10% of correct samples were incorrectly classified. As
                                                                        the reason, some noise or recording time can be the issue. For
                                                                        samples made by three different counterfeiters, the average
                                                                        rate of fraud detection was 80% which is a good result
                                                                        considering the number of samples. A more detailed analysis
                                                                        of the measurements is shown in Tab. I, where the average
                                                                        effectiveness is 85%. Similarity coefficient reached 0.64 which
                                                                        is quite high value. However, it is worth noting that the
                                                                        obtained data should be contained within a fairly wide error
                                                                        range. Similarly with the other coefficients – the probability
                                                                        of obtaining a negative classification assuming that the sample
                                                                        is true is 0.33, and the probability of positive verification for
                                                                        fraud is 0.53. The obtained results indicate a high degree of
                                                                        effectiveness despite the number of sound samples as well as
                                                                        the extraction and classification technique itself.
                                                                                                  V. C ONCLUSIONS
Figure 2: A graphical summary of the obtained results for all              In this paper, the idea of audio analysis based on the
samples in the test database.                                           mechanism of pattern matching with k-nearest neighbors was
                                                                        presented. It is important to develop more different techniques
  Proposed method was tested on a small dataset consisting              for security due to the reduction in the number of calculations,
only 60 samples, from which half of them contained the three            simplifying the operation as well as increasing the precision of
words "James Tiberius Kirk" made by one person who is                   actions. This technique was implemented and tested on a small
identified with this data (so called owner). The remaining              dataset consisting only 60 samples. Half of them belonged to
30 samples were created by three different people (so called            the one person (called as owner), and the rest of them to three
counterfeiters), each of them has created 10 samples.                   other people which were a forgery and used for verification
   Using only 10 samples from the owner (selected randomly),            purposes. Due to the noise and different recording times, the
pattern was modeled. Then, all samples in the database were             program incorrectly classified 10% of true records. However, it
checked for pattern match. If the compatibility was at least            does not change the fact that the effectiveness of the proposed
80%, then it was marked as owner. Otherwise, the sample                 idea reached almost 85%. It is worth noting that it was tested
was marked as falsification.                                            for k = 4, and increasing the number of neighbors resulted in
   The verification of the effectiveness of the proposed tech-          a decrease in the correctness of classification, which may be
nique was examined by grouping the samples as T P (true                 due to the number of samples.
positive), T N (true negative), F P (false positive), F N (false           An important aspect of further research is increasing the
negative). For such divided results, accuracy was calculated            database with a much larger number of recordings, increasing
as Γ, Dice’s coefficient as Λ, overlap Ψ, sensitivity Υ and             noises or problems with the voice of the recording person.
specificity Φ according to                                              It is particularly important to be able to bypass hoarseness
                         TP + FN                                        or remove the cough. In the case of accuracy, the use of
                Γ=                       ,                 (10)         other, more complicated classification (like neural networks)
                   TP + TN + FP + FN
                                                                        methods may prove to be a much more favorable approach.
                           2T P
                  Λ=                   ,                   (11)
                     2T P + F P + F N                                                                R EFERENCES
                            TP                                           [1] F. Beritelli, G. Capizzi, G. L. Sciuto, C. Napoli, and F. Scaglione.
                  Ψ=                  ,                    (12)              Automatic heart activity diagnosis based on gram polynomials and
                      TP + FP + FN
                                                                             probabilistic neural networks. Biomedical Engineering Letters, 8(1):77–
                            TP                                               85, 2018.
                    Υ=            ,                        (13)          [2] D. Birvinskas, V. Jusas, I. Martisius, and R. Damasevicius. Eeg dataset
                         TP + FN                                             reduction and feature extraction using discrete cosine transform. In
                           TN                                                Computer Modeling and Simulation (EMS), 2012 Sixth UKSim/AMSS
                    Φ=            .                        (14)              European Symposium on, pages 199–204. IEEE, 2012.
                         TN + FP


                                                                   21
Figure 3: Confusion matrix for the owner’s sample classifica-                     Figure 5: Confusion matrix for the second counterfeiter’s
tion.                                                                             sample classification.


Figure 4: Confusion matrix for the first counterfeiter’s sample                   Figure 6: Confusion matrix for the third counterfeiter’s sample
classification.                                                                   classification.


[3] W. T. Cochran, J. W. Cooley, D. L. Favin, H. D. Helms, R. A.                  [6] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami. Internet of things
    Kaenel, W. W. Lang, G. Maling, D. E. Nelson, C. M. Rader, and P. D.               (iot): A vision, architectural elements, and future directions. Future
    Welch. What is the fast fourier transform? Proceedings of the IEEE,               generation computer systems, 29(7):1645–1660, 2013.
    55(10):1664–1674, 1967.                                                       [7] P. N. Mahalle, P. A. Thakre, N. R. Prasad, and R. Prasad. A fuzzy
[4] J. W. Cooley, P. A. Lewis, and P. D. Welch. The fast fourier transform            approach to trust based access control in internet of things. In
    and its applications. IEEE Transactions on Education, 12(1):27–34,                Wireless Communications, Vehicular Technology, Information Theory
    1969.                                                                             and Aerospace & Electronic Systems (VITAE), 2013 3rd International
[5] R. Damaševičius, C. Napoli, T. Sidekerskienė, and M. Woźniak. Imf              Conference on, pages 1–5. IEEE, 2013.
    mode demixing in emd for jitter analysis. Journal of Computational            [8] Z. Marszałek. Parallelization of fast sort algorithm. In International
    Science, 22:240–252, 2017.                                                        Conference on Information and Software Technologies, pages 408–421.


                                                                             22
     Springer, 2017.
 [9] Z. Marszałek. Parallelization of modified merge sort algorithm. Sym-
     metry, 9(9):176, 2017.
[10] C. Perera, C. H. Liu, and S. Jayawardena. The emerging internet of
     things marketplace from an industrial perspective: A survey. IEEE
     Transactions on Emerging Topics in Computing, 3(4):585–598, 2015.
[11] D. Polap. Extraction of specific data from a sound sample by removing
     additional distortion. In Computer Science and Information Systems
     (FedCSIS), 2017 Federated Conference on, pages 353–356. IEEE, 2017.
[12] D. Połap and M. Woźniak. Extraction and analysis of voice samples
     based on short audio files. In International Conference on Information
     and Software Technologies, pages 422–431. Springer, 2017.
[13] N. Romano, A. Scivoletto, and D. Polap. A real-time audio compression
     technique based on fast wavelet filtering and encoding. In Computer
     Science and Information Systems (FedCSIS), 2016 Federated Conference
     on, pages 497–502. IEEE, 2016.
[14] S. Sicari, A. Rizzardi, L. A. Grieco, and A. Coen-Porisini. Security,
     privacy and trust in internet of things: The road ahead. Computer
     Networks, 76:146–164, 2015.
[15] J. T. Starczewski, S. Pabiasz, N. Vladymyrska, A. Marvuglia, C. Napoli,
     and M. Woźniak. Self organizing maps for 3d face understanding. In
     International Conference on Artificial Intelligence and Soft Computing,
     pages 210–217. Springer, 2016.
[16] M. Vasiljevas, R. Turčinas, and R. Damaševičius. Development of emg-
     based speller. In Proceedings of the XV International Conference on
     Human Computer Interaction, page 7. ACM, 2014.
[17] W. Wei, X. Fan, H. Song, and H. Wang. Video tamper detection based
     on multi-scale mutual information. Multimedia Tools and Applications,
     pages 1–18, 2017.
[18] W. Wei, Q. Xu, L. Wang, X. Hei, P. Shen, W. Shi, and L. Shan.
     Gi/geom/1 queue based on communication model for mesh networks.
     International Journal of Communication Systems, 27(11):3013–3029,
     2014.
[19] A. Whitmore, A. Agarwal, and L. Da Xu. The internet of things—a
     survey of topics and trends. Information Systems Frontiers, 17(2):261–
     274, 2015.
[20] M. Woźniak and D. Połap. Adaptive neuro-heuristic hybrid model for
     fruit peel defects detection. Neural Networks, 98:16–33, 2018.
[21] M. Wozniak, D. Polap, G. Borowik, and C. Napoli. A first attempt
     to cloud-based user verification in distributed system. In Asia-Pacific
     Conference on Computer Aided System Engineering (APCASE), pages
     226–231. IEEE, 2015.
[22] M. Woźniak, D. Połap, L. Kośmider, and T. Cłapa. Automated
     fluorescence microscopy image analysis of pseudomonas aeruginosa
     bacteria in alive and dead stadium. Engineering Applications of Artificial
     Intelligence, 67:100–110, 2018.


                                                                                  23