Using Models of Parallel Specialized Processors to Solve the Problem of Signal Separation V A Zasov1 1Samara State Transport University, Svobody street, 2B, Samara, Russia, 443066 e-mail: vzasov@mail.ru Abstract. This paper considers models of highly efficient specialized processors used for parallel data processing as part of solving the problem of extracting individual signals from an additive mixture of several signals. The proposed models of recursive, nonrecursive, and regularization-based parallel specialized processors provide versatility in solving the problem of signal separation with various algorithms. An advantage of regularization-based processors is that they make the solution stable under conditions where the parameters of objects exhibit expected uncertainty when the inverse problem of signal separation is ill-posed. This paper presents the results we obtained from an asymptotic analysis of the computational complexity involved. The results identify the time it takes to solve problems by using specialized processors. The paper also identifies the conditions for the efficient use of specialized processors. 1. Introduction The problem of signal separation consists in determining source signals unavailable for direct measurements by using source signals measured in accessible points where the signals are an additive mixture of source signals that are distorted when transmitted. The computational complexity of algorithms involved in solving that problem is high and, for many applications, is of O(N 3 ) order, where N is the number of signal sources [1]. This makes it difficult to use these algorithms. Computation parallelization is the conventional approach to reducing the time it takes to solve the problem of signal separation [2,3]. Parallel algorithms for signal separation have been developed for multicore processors, multiprocessor systems with shared and distributed memory, and multicomputer systems [4]. That solution is needed in many practical fields such as monitoring and diagnosis of technical facilities [5], communications, medical diagnosis, speech [6] and image [7] processing. This is because in complicated facilities, measured signals present an additive mixture of signals received from many components, and in most practical applications the extraction of parameters that describe the state of specific components is impossible without signal separation. The next significant performance improvement is possible through the use of specialized processors whose architecture and computational processes correspond most to the structure of the algorithm for the class of problems in question [1,8]. Signal separation methods can be classified into two groups—deterministic and statistical [1]. The deterministic group is based on principal information about signal transmission channels (statistical, frequency, amplitude, and other channel characteristics); that is, transmission channels and V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) Data Science V A Zasov signals are known. The statistical group is based on principal information about signal sources such as lacking source correlation and the knowledge of signal distribution laws. In this case, explicit information about transmission channels is unavailable, and only observed signals are known. For that reason, the methods within this group are often called “blind” [9]. Thus, the solution to the problem of separating of signal sources reduces to using a deterministic or statistical method to calculate the separating matrix equal or close, in terms of specific criteria, to the matrix inverse to mixing matrix. The functionality of commercially available specialized digital signal processors and field- programmable gate arrays is insufficient for solving the complex problem of signal separation. Reference [10] only proposes basic signal-separation functions and objectives for specialized processors used for signal separation and restoration; and for the processor models described in [11], the analysis of the computational complexity involved in parallel processing is inadequate to identify the conditions for the efficient use of the processors. It is advisable that the structure of specialized processors should be regular and have a neural network architecture [12]. Besides, those processors do not provide stable solutions under conditions where the properties of objects exhibit expected uncertainty when the inverse problem of signal separation is ill- posed. The purpose of this paper is to develop parallel specialized processors for signal separation under conditions where the parameters of objects exhibit expected uncertainty and to analyze asymptotically the computational complexity of parallel processing to identify the conditions for the efficient use of the processors. 2. Research Area Since there are many algorithms for solving the problem of signal separation [1,9-13], parallel specialized processors should provide versatility in this class of problems. We will assume that the processor model consists of two units: the generic unit, which carries out the algorithm’s procedure steps; and the specialized unit, which provides structural simulation for the algorithm. Let us assume that the model of signal formation is a linear multidimensional system with N inputs and M outputs [1,14]. The model’s input signals are sn  k  , n  1,2,...,N ; its output signals, xm  k  , m  1,2,...,M . The input signals come from a variety of sources unavailable for direct measurement, and the output signals come from various receivers such as detectors and antennas. We will assume that each output M is linked with all the N inputs through linear signal-transmission channels. The mathematical model of signal formation is described by discrete-convolution equations (1), where the m th observable signal is the additive mixture of channel-distorted source signals and noise [1,14] - that is, N G 1 xm  k    hmn  g ,I  sn  k  g   ym  k  , (1) n 1 g  0 where hmn  g ,I  is the element N  M of the mixing matrix h  g ,I  for the channels’ pulse responses; y  k  is the noise vector; g  0,...,G - 1 and k  0,...,K - 1 are the samples of the pulse responses for channels and signals, respectively. Let us assume that the channels’ pulse responses hmn  g ,l  are finite and that they depend on a certain parameter vector, l (time, locations of sources and receivers in relation to one another, etc.) [14]. Generally, the solution to the problem of separating source signals is (1), and it can be written as M G 1 sn  k    wnm  g ,I  xm  k  g  , (2) m 1 g  0 where wnm  g ,I  are the pulse responses of separating filters, and they form the separating matrix V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 291 Data Science V A Zasov w  g ,I  , which is equal or close to a matrix inverse to the matrix h  g ,I  . In separating signals by source, we will split the computation into two steps. In step 1, the elements wnm  g ,I  of separating matrix w  g ,I  are determined from the measured dynamical properties of channels or from signal parameters, and the signal-separation algorithm is adjusted. The algorithm for computing wnm  g ,I  (the adjustment algorithm) takes into account the nuances of the given signal-separation algorithm. In step 2, the signals are separated with the digital separating filters adjusted in step 1. This computational process is the same for different signal-separation algorithms. With this in mind, we will look at two computational units in the model of a parallel specialized processor: the adjusting processor (AP) (the versatile unit) and the functional processor (FP) (the specialized unit). The nonrecursive, recursive and regularization-based processor models treated below differ in the methods used to solve (1) but feature the same basic elements that the models are based on. 3. Model for a Nonrecursive Parallel Specialized Processor The model for the nonrecursive parallel specialized processor implements the method used to solve the system of equations (1) by inversing the mixing matrix h  g ,I  . The nonrecursive processor model [10,11] is best written in the form convenient for parallel stream processing in the time domain: M G 1 s1  k    w1m  g ,I xm  k  g  m 1 g  0 ................................................... , M G 1 sN  k    wNm  g ,I xm  k  g  m 1 g  0 where s1  k  ,...,sN  k  are the calculated signals that are approximations (samples) of the true signals s1  k  ,...,sN  k  in the points they are formed in; and wnm  g ,I  are the elements of separating matrix w  g ,I  obtained from 1 K 1  2 k g  wnm  g,I    Wnm g ,I   exp  i . K k 0  K  The frequency transmission factor Wnm (  g ,I ) is an element located at the crossing of the nth row   and the mth column of the spectral matrix W g ,I , which is inverse of the H g ,I matrix—that is,   W g ,I   H1 g ,I  at M=N (or of the pseudo-inverse matrix W g ,I   H g ,I  at M≠N). The functional processor separates the source signals xm  k  according as they belong to the sources sn  k  . This processor implements a model that is inverse of the signal-formation model, and the processor has the regular homogeneous structure composed of N  M adjustable filters (AF) and N adders (A). The filters and the units compute linear convolutions simultaneously and independently of one another. The computational complexity Lnonrec FP  K  of the functional processor’s operation algorithm, which determines the processing time, is characterized by the height of its parallel structure, is of Lnonrec FP ( K )  O(K) order, and does not depend on the number of signal sources. The adjustment processor calculates the coefficients wnm  g ,I  for the AFs from the measurements of the channels’ transient responses (deterministic methods) or from the characteristics of signal sources V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 292 Data Science V A Zasov (statistical separation methods) [9,13]. For deterministic methods, the algorithm used to compute the coefficients wnm  g ,I  consists of the following steps: using a fast Fourier transform for the channels’ transient responses to obtain the mixing spectral matrix H  ,I  ; inversing the spectral matrix H  ,I  to obtain the separating spectral matrix W  ,I  ; and using an inverse fast Fourier transform (IFFT) for the elements of separating matrix W  ,I  to obtain weight coefficients for the AFs, specified by the matrix w  g ,I  . The parallel form of the adjustment algorithm’s first and third steps has a width of N  M and is implemented by the N  M units of the fast Fourier transform and the inverse fast Fourier transform. AP1,3  G  , which determines the time it takes to complete these steps The computational complexity Lnonrec AP1,3  G   O( G log 2 G ) order. depending on the heights of their parallel structures, is of Lnonrec The parallel form of the algorithm used to compute the separating matrix has a width of G and is implemented with G units for inversing N order matrices (assuming that N=M). Each of these units, in turn, implements the parallel form of the algorithm used to compute the inverting matrix (e.g., [3]) with a width of O(N 4 ) . AP 2 (N)  О(log 2 N) of the algorithm’s parallel form determines the time it takes to The height Lnonrec 2 invert the spectral matrix H  ,I  . AP1,2 ,3  G,N  of the adjustment processor’s operation algorithm is The computational complexity Lnonrec significantly higher than the computational complexity LnonrecFP  O(K) of the functional processor’s operation algorithm—that is, O( Glog2 G )  O(log 22 N )  O( Glog 2 G  log 22 N )  O(K) . AP1,2 ,3  G,N  Lnonrec For instance, at K  G  N the relation  log 2 G . Thus, separating signals with the Lnonrec FP (K) proposed nonrecursive processor is acceptable if within the signal interval determined by K  Glog2 G the parameters of the mixing matrix are assumed invariable—that is, if the signal-formation model is quasistationary. Furthermore, given the polynomial relationship between the width O(N 4 ) of the parallel form of the matrix-inverting algorithm and the number of signal sources, we can conclude that the model of the nonrecursive processor we discussed can be used in practice to separate the signals sn  k  when the number of signal sources is low. 4. Model for a Recursive Parallel Specialized Processor Figure 1 shows a recursive parallel specialized processor model that implements the iteration method for solving the system of equations (1). The model’s functional processor (FP) has a regular homogeneous structure composed of M identical processing units (PU) [10,11]. All PUs operate parallel in time, and each implements the recursive  algorithm for extracting one signal sn k  from an additive mixture of several signals. FP  K   of the FP operation algorithm, which determines its The computational complexity Lrec FP ( K  )  O(K  ) order (where operating time, is characterized by the height of its parallel form, is of Lrec  is the number of iterations), and does not depend on the number of signal sources. The adjusting processor (AP) consists of a clock unit (CU) and M groups of devices comprising an adjusting unit (AU) and a memory unit (MU). The pulse responses of the filters FPmn , n  1,...,N , and m  1,...,M (note the exclusion of m  n ) need not be calculated with the AP since the frequency characteristics of AFmn filters are equal to the frequency characteristics of channels with related indexes V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 293 Data Science V A Zasov in the mixing matrix H  ,I  of the signal-formation model. These characteristics should only be stored in MUm, m  1,...,M . The transient responses h11  g  , h22  g  ,...,hMN  g  , g  0,...,G  1 of the adjustable inverse filters AIFmn (only if m  n ) are computed in AUm, m  1,...,M . Figure 1. Model for a recursive parallel specialized processor (N=M). The algorithm for computing hmn( m  n )  g,I  consists of the following steps: using a fast Fourier transform for the transient responses of the channels hmn( mn )  g,I  to obtain the characteristics H mn g ,I  ; computing H mn  g ,I   1 ; and using a fast Fourier transform for the H mn  g ,I  characteristics H mn g ,I  to obtain the weight coefficients hmn( m  n )  g,I  for AIFs. The parallel form of the adjustment algorithm’s first and third steps has a width of N and is implemented by N FFT and IFFT units. AP1,3  G  , which determines the time it takes to complete these steps The computational complexity Lrec AP1,3  G   O( G log 2 G ) order. depending on the height of their parallel form, is of Lrec The parallel form of the algorithm used to compute the channels’ inverse characteristics H mn g ,I  has a width of O( N  G ) and is implemented with N  G division units, while the height V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 294 Data Science V A Zasov CU (N)  О( 1 ) of the algorithm’s parallel form is a constant. Lrec The CU synchronizes the transmission of parameters from AP to FP, sets the initial conditions, and controls the output registers while completing processing iterations. This unit’s operation algorithm has a constant complexity of Lrec CU (N)  О( 1 ) . The assessment of the computational complexity of the AP and FP algorithms (e.g., at K  G ), AP1,2 ,3  G  Lrec O( G log 2 G )  O( 1 )  O( 1 )   log 2 G , rec LFP (K) O( K  ) for the recursive processor presents the conclusion that signal separation with the proposed recursive processor is acceptable for the quasistationary model of signal formation. But the width of the AP algorithm’s parallel form for the recursive processor is significantly lower than that of the nonrecursive one: O  N  G  O( N 4  G ) . This advantage of the recursive processor makes it possible to apply the solution to the problem of separating signals sn  k  for many more sources under conditions where computational resources are limited. For a recursive processor to separate signals steadily, the object must allow the receivers of signals to be installed such that in the linear superposition of signals at the outputs of each of the receivers, the signal from a specific source is predominant [1]. 5. Model for a Regularization-Based Parallel Specialized Processor If the parameters of the mixing matrix H  ,I  or of the source signals s(k ) make the problem of signal separation ill-posed or if those parameters show expected uncertainty, then one should at once find a regularized, stable solution to (1) or its equivalent in the frequency domain. The proposed model for a regularization-based specialized processor is based on the Tikhonov regularization [15]. Two conditions to the Tikhonov regularization are set in the processor model: the disparity minimization Hs  x  min , as in the least-squares technique (LST); and the minimization of the s solution norm, s  min , as in the Moore–Penrose pseudo-inverse of a matrix [16]. s The solution s contained in the processor provides the absolute minimum of the smoothing functional F  s  expressed as F  s  Hs  x   s , 2 where   0 is the regularization parameter;  s  is the stabilizing functional; and H and x are approximate values of H and x for which H  H   H and x  x   , where  and  H are the upper estimates of absolute measurement errors for the signals x and the mixing-matrix elements H . The proposed model uses  s   s as a stabilizing functional. For control purposes, it is more 2 natural and convenient to present signals in time form, so we will write the smoothing functional for M = N and K = G as M K 1 2 N  M K 1 F s     xm  k ,I   xm  k ,I       sn,  k  .2  (3) m 1 k  0 n  m 1 k  0 The signal xm  k ,I  in (3) derives from the separation results redistorted by the signal formation model; that is, V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 295 Data Science V A Zasov N xm  k,I    sn,  k  * hmn (k ,l) , n 1 where sn ,  k , I  are the regularized results of signal separation for the object’s nth node. Under the conditions described above, the smoothing functional can be written as M K 1 2 N  M G 1 F  w     xm  k , I   xm  k , I       wnm ,  g ,I  ,2  m 1 k  0 n  m 1 g  0 an expression that is more suitable for the regularized computation of the elements wnm ,  g ,I  for the separating matrix w  g ,I  , which sets the weights of the functional processor’s AFs. The elements wnm ,  g ,I  for the selected regularization parameter  are determined from the minimum condition of the smoothing functional F   w  , keeping in mind that this functional’s quadratic form is positively definite. The elements wnm ,  g ,I  are calculable, for instance, by solving the system of M  N  G equations F  ( w) written as  0 relative to wnm ,  g ,I  , using the parallel algorithms for solving linear wnm  g ,l  algebraic equations [2,3]. For the known (specified) errors  H and  , we propose calculating the regularization parameter as the root of the equation Hs  x    H s  /  r ,M ,N  , 2 2 in which  is the parameter s , where   r ,M ,N   1 is a scalable multiplier determined by the problem’s dimension ( M  N ) and by the measurement error of signal and channel parameters (the error depends, in particular, on the resolution r of analog-to-digital conversion). With the regularization parameter  so obtained, the smoothness and disparity of the solution for s are acceptable for practical purposes. Figure 2 shows the model of a regularization-based specialized processor. The FP is a model that is inverse of the signal-formation model and that separates measured signals. This processor has a homogeneous structure and consists of AFs and AUs. The computation of regularization parameter  and the adjustment of AFs, whose number is equal to M 2 , are run by the AP. The AP’s processing unit (PU) computes AF parameters with the least-squares technique through minimizing the smoothing functional F   w  . The elements of the mixing matrix h  g,I  , which set the samples for the pulse responses of the signal-formation model’s channels, enter the inputs of the AP, and the AP generates a direct signal- formation model. Disparity evaluation units (DEUs) compute the disparity  m2 for each of the processor’s m channels. The units receive signals delayed by delay units (DUs), which serve to delay signals from the processor inputs and signals from the outputs of the direct signal-formation model. It is advisable to use deterministic parallel optimization algorithms [17] designed for multicore processors to minimize the smoothing functional F   w  . All the AFs operate in parallel independently of one another. This makes the proposed processor fast and reliable. The processor model shown in figure 2 is generalized to obtain stable solutions for system (1). For that reason, the model is highly complicated. For practical applications, it can be significantly simplified by using prior information about the signal-formation model (such as the presence of reference inputs [18]) or by using simpler and parallel regularization algorithms [19]. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 296 Data Science V A Zasov h11 h1N hM1 hMN Adjusting processor AP’s processing unit AFMN DUM DEUM AM dir AFM1 Direct object model AF1N DU1 DEU1 A1 dir AF11 Functional processor (inverse object model) AF11 A1 AF1M inv AFN1 AN AFNM inv Figure 2. Model for a regularization-based parallel specialized processor. 6. Modeling Results Figure 3 shows the modeling results for test signals separated by the nonrecursive parallel specialized processor with a multicore, GPU-based architecture. The signal-formation model had three signal sources: the first two were triangular pulses with different frequencies and shapes while the third was a speech signal. The signal receivers used 8-bit ADCs with a sample rate of 12 kHz. The figure 3 shows the initial signals (top), additive mixtures of signals in each of the receivers (middle), and extraction results for each signal (bottom). The error of signal separation does not exceed 10%, a value acceptable for many engineering applications. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 297 Data Science V A Zasov V V V Time, sec V Time, sec V Time, sec V Time, sec Time, sec V Time, sec V V Time, sec Time, sec Time, sec Figure 3. Modeling results for test signals separated with a nonrecursive parallel specialized processor. The results of computational experiments shown in figure 4 for the example above show notably shorter task times (the Independent Component Analysis (ICA) [9] algorithm was used). Time (ms) a a b Signal samples Figure 4. Experimental time–sample relationships for the problem of signal separation with a serial (a) and parallel nonrecursive specialized (b) processors. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 298 Data Science V A Zasov 7. Basic Conclusions This paper proposed using models of highly efficient nonrecursive, recursive and regularization-based parallel specialized processors to solve the problem of signal separation. Once adjusted, the processors can solve the problem within a period that does not depend on the number of signal sources—that is, the processors have a task time of TFP ( N )  O( 1 ) order. These models are applicable where the parameters of the model-formation model are variable (quasistationary). Regularization-based processors make the solution stable under conditions where the parameters of objects exhibit expected uncertainty when the inverse problem of signal separation becomes ill-posed. The regular homogeneous structure of the functional processor can be conveniently implemented as an integrated circuit or a multicore-architecture computational system. 8. References [1] Zasov V A 2013 Algorithms and Computational Devices for Separating and Restoring Signals in Multivariable Dynamic Systems (Samara: Samara State Transport University Press) p 233 [2] Gergel V P 2007 Theories and Applications of Parallel Computations (Moscow: IT Internet University: BINOM Knowledge Laboratory) p 423 [3] Demiyanovich Y K, Burova I G, Yevdokimova T O, Ivantsova O N and Miroshnichenko I D 2012 Parallel Algorithms: Development and Implementation (Moscow: IT Internet University: BINOM Knowledge Laboratory) p 344 [4] Patterson D A and Hennessy J L 2012 Computer Organization and Design (Saint Petersburg: Peter) p 784 [5] Vasin N N and Diyazitdinov R R 2016 A machine vision system for inspection of railway track Computer Optics 40(3) 410-415 DOI: 10.18287/2412-6179-2016-40-3-410-415 [6] Ifeachor E C and Jervis B W 2004 Digital Signal Processing: A Practical Approach (Moscow: Williams Publishing House) p 992 [7] Denisova A Y, Juravel Y N and Myasnikov V V 2016 Estimation of parameters of a linear spectral mixture for hyperspectral images with atmospheric distortions Computer Optics 40(3) 380-387 DOI: 10.18287/2412-6179-2016-40-3-380-387 [8] Mitropolskiy Y I 1985 Problems of Versatile to Custom Tools Ratio in Computational Systems Kibernetika i vychislitelnaya tekhnika 1 35-48 [9] Cichocki A and Amari Sh 2002 Adaptive blind signal and image processing: Learning algorithms and applications (New-York: John Wiley & Sons, Ltd) p 555 [10] Zasov V A and Romkin M V 2012 Parallel Computations for the Signal Separation Problem in Multidimensional Dynamic Systems Parallel Computations and Management Objectives. Proc. of the 6-th Int. Conf. (Moscow: Russian Academy of Science, Trapeznikov Institute of Control Science Press) 96-102 [11] Zasov V A and Romkin M V 2013 Parallel Computational Models for Solving the Problem of Signal Separation Vestnik transporta Povolzhiya 6(42) 77-86 [12] Haykin S 2006 Neural Networks: A Comprehensive Foundation (Moscow: Williams Publishing House) p 1104 [13] Kravchenko V F 2007 Digital Signal and Image Processing in Radiophysical Applications (Moscow: Nauka, Fizmatlit) p 544 [14] Zasov V A and Nikonorov Ye N 2017 Modeling and Investigating the Stability of a Solution to the Inverse Problem of Signal Separation CEUR Workshop Proceedings 1904 78-84 [15] Tikhonov A N and Arsenin V Y 1986 Methods for Solving Ill-posed Problems: a Textbook for Universities (Moscow: Nauka, Fizmatlit) p 288 [16] Tyrtyshnikov E E 2007 Matrix Analysis and Linear Algebra (Moscow: Nauka, Fizmatlit) p 480 [17] Strongin R G, Gergel V P, Grishagin V A and Barkalov K A 2013 Parallel Computation in Global Optimization Problems (Moscow: Moscow State University Press) p 285 [18] Dzhigan V I 2013 Adaptive Signal Filtering: Theory and Algorithms (Moscow: Tekhnosfera) p 528 [19] Zhdanov A I and Sidorov Y V 2015 Parallel implementation of a randomized regularized Kaczmarz's algorithm Computer Optics 39(4) 536-541 DOI: 10.18287/0134-2452-2015-39-4-536-541 V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 299