Introduction

Methods Assessment the Probability Density of Discrete Signals in Telecommunications

Yuriy Kropotov

Aleksey Belov

0 0 Murom Institute (branch) "Vladimir State University named after Alexander and Nicholay Stoletovs" , Murom , Russia

745 754

This paper is devoted to investigation of problems and methods of acoustic signals modeling in the information and control systems for audio exchange communications. The problems of estimation and approximation of probable density functions, which may assist in distinction of acoustic speech signals and external acoustic noise. We consider the direct and indirect methods, techniques histogram evaluation, ways to overcome incorrect problems.

Probability density discrete signals telecommunication systems distribution function

Introduction

Evaluation distributions speech signals and noise, as well as any nature of data, based on empirical derived from experimental results of measurements [1]. There are many methods of preparing such estimates, divided into many parametric and nonparametric, direct and indirect methods.

Under the parametric or understood by classical methods and the methods in which the probability density is known to an accuracy of parameters, it has the form

In: A. Kononov et al. (eds.): DOOR 2016, Vladivostok, Russia, published at http://ceur-ws.org

If the function f (x, ) is not a probability density, the parameter vector estimation methods  are considered to be non-parametric. In this case - it is a task of approximation or approximation of the observed data. The resulting approximation function f (x, ) must satisfy the constraints [1, 3]

 f (x, )  0 and  f ( x, )dx  1 .

 ( 1 )

A clear distinction between parametric and non-parametric methods is not always possible. Thus, the problem of data closer mixture of known distributions represented density functions  k (x, k ) , f (x, )   ak k (x, k ) ,  ak  1 ,, more approk k priately be classified as non-parametric tasks. However if the coefficients ak  0 are known, the task can be seen as a parametric. For nonparametric problems are the problems of least squares or linear and nonlinear regression. Methods for solving such problems is also called projection methods. It should be noted that the definition of non-parametric methods above only used in mathematical statistics. In the field of systems theory, optimization, approximation and approach them, on the contrary, it is called parametric [4, 7], based on the meaning of the tasks is to find a finite number of unknown parameters. 2

Direct and indirect methods of estimating the probability density A number of studies estimating the probability density methods are divided into direct and indirect methods. This hallmark of the direct methods is to use a direct link with the required density of empirical data. For example, to direct methods include methods based on the solution of the integral equation relating the probability density of the empirical distribution function   I (x  v) f (v)dv  Fn (x) , ( 2 )  where F  (x) is the empirical distribution function of the stepped type. The solun tion of equation ( 2 ) gives the desired estimate of the probability density. The empirical distribution function is given by

1 N Fn (x)   I(, x] (xl ) ( 3 )

N l1 where I(, x] (xl )  the indicator of the set (, x] ,

1, I(, x] (xl )   0, xl  (, x] xl  (, x]

and N  the sample size.

Problem solving equation ( 2 ) with the function ( 3 ), as already indicated, it relates to a class of incorrect and requires the use of special techniques. Especially the incorrectness is shown with a small sample size [5]. Thus the need for recovery of the probability density limited amount of data arises frequently, for example, in connection with the analysis and segmentation unsteady, particularly speech signals, the statistical characteristics can only be considered as constant intervals of similar sounds.

Unlike the direct, indirect methods are based on the average risk minimization functional described by expressions of the form

R     Q(x, )dF (x)  Rn    1 n

Q(xl , ) .

N l1 or their corresponding empirical functionals

1 n Rn    Q(xl , ) .

N l1

According to this criterion to indirect methods include, such as the maximum likelihood method [6].

Direct, in principle, other methods, such as histogram techniques and methods based on approximation   functions of a regular feature in the in the expression  f (x)    (x  v) f (v)dv .

 ( 4 )

However, a clear distinction between direct and indirect methods, in general, is not always possible. And due to the fact that in both cases, the problem of finding the density estimates may result in one way or another, to the problem of minimizing a functional of the empirical data, in particular, from the empirical distribution function. 3

On nuclear and projection estimates the probability density The nuclear method for obtaining estimates of the density based on the approximated   function under the integral sign in ( 4 ) is a function K (x) defined on some interval of the argument. This function must satisfy the condition

1  x  lim K     (x) .

h0 h  h  As a function K (x) frequently used expressions

1 2, K (x)    0, x  1 tation

The right-hand side of equation ( 4 ) after such a substitution is a function of expec1 K  x  , which can be replaced by the empirical mean value h  h  If we consider that option is chosen on the basis of the sample size, the probability density estimate in accordance with equation ( 4 ) can be written as

1 n  x  xl  fˆ (x)   K   . ( 5 )

nh(n) l1  h(n) 

The convergence of this expression to the desired density estimation provided by the conditions: 1) h(n)  0 if n   and 2) a  0 for any number of inequality l1   e nh(n)   .

The definition of function K (x) can be seen that with the decrease of the parameter h is an increase in the accuracy of the approximation functions  , but at the same time, increasing the chances of erroneous classification evaluation to class multimodal densities. Conversely, increasing this setting may lead to an erroneous assessment of the assignment to the unimodal density. The problem of choosing a parameter h that arises in this regard stems from the incorrect density estimation problem and for this reason has no unique solution. We can only assert that in assessing unimodal distributions require higher values h than in the case of multimodal. Equation ( 4 ), and you can use when assessing the probability density projection method. In this method, an unknown probability density is represented by a polynomial system of normalized orthogonal functions k (x)m , while assessment [1] 1 la fˆ (x) 

Substituting 1 n m

 k (x)k (xl ). n l1 k 1

m fˆ (x)   akk (x) .

k 1 Substitution of this polynomial in ( 4 ) gives the equation

m d f (x)   akk (x) . and ak  k (x) f (x)dx 

k 1 c this expression in ( 6 ) leads ( 6 ) 1 n

k (xl ) n l1 to the

formum Finally, if you enter the kernel function K (x, xl )  k (x)k (xl ) the estimate of k 1 the density takes the form similar to ( 5 )

1 n fˆ (x)   K  x, xl  .

n l1

The use of projection methods in which the score is represented by formula ( 6 ), is not limited to the case considered. There are tasks that are equally based on a projection methods, and the integral equation ( 2 ). In one of the approaches are evaluated on the smoothed data that is provided by a non-degenerate linear operator of the form d B g(x)   K (x, v)g(v)dv .\

c The action of the operator on ( 2 ) leads to the equation

G f (x)  Qn (x) ( 7 ) where

d d G f (x)   K (x, z) I (z  v) f (v)dvdz , c

c Qn (x)  B

Fn (x)  1 n d

  K (x, v)dv , n l1 xl and ( 6 ) is an expansion with respect to functions k (x)1m of operator G GH .

The solution of equation ( 7 ) because of its incorrectness reduced to the problem of minimizing the functional

d 2 d J ( fˆ )   G fˆ (x)  Qn (x) dx  j  fˆ 2 (x)dx . (8) c c

It is shown that this functional reaches a minimum at values of the coefficients of the polynomial-patients ( 6 ) ak   b

k k k2  j

, d where bk   Qn (x) k (x)dx and  k (x) , k  its own functions and values of c the operator GH G .

In the particular case when the core K(x, v)  K(x  v) of the operator B the opd erator to convolution B f (x)   K (x  v) f (v)dv .

This allows for the minimization of the functional (8) to take advantage of the Fourier transform [1, 6, 7]. Using this, evaluation of density

1 n fˆ (x)  n  j l1 g (x  xl )  j  .

Here the function g(u)  1 

 g( )e jud . It is the inverse Fourier transform 2  g( ) 

K ( )K ( ) K ( )K ( )  j 2

 and K ( )   K (u)e ju du .



The histogram assessment of the probability density The histogram is called a bar chart of the distribution of the random variable. The height of each column represents the number of values of the random value falling within the appropriate interval, generally different widths (see Fig. 1.). The ratio of the random variable values nl from the interval (xl1, xl ] to the total number of values N is the empirical probability of the event x  (xl1, xl ] . 3500 3000 2500 2000 1500 1000 500 0-2 -1 0 1 2 3 The theoretical value of this probability is written at the same time through a probaxl bility density P  x  (xl1, xl ]   f (x)dx .

xl 1

If we equate the theoretical and empirical density and assume that within each interval change in the probability density can be neglected, the density estimation can be written as where xl  xl  xl1  the lenght of the l  interval.

When splitting field (c, d ] of the random variable values q at equal intervals of length xl  (d  c) q and formula (9) can be written as fl 

nl xl N , l  1,

, q , fl 

nl q (d  c)N , l  1, , q , (9) (10)

Count value obtained by the formula (9) or (10), etc. may be used for approximation of the probability density. Units corresponding to estimates, in the first approximation can be found from the expressions

1 where f   f1,

T , f m  and    (x1), (x2 ), , (xm ) .

In evaluating you can also take advantage of the generalized method of local interpolation. In this method, a sequence of the form of formula (11) as defined for the corresponding sequence of interpolation intervals. At the same time these formulas are supplemented by restrictions, providing the necessary conditions of conjugation of local solutions, and the order of the polynomial is not required to match the number of points (xl , fl ) , that is q  m .

Approximation of probability density smoothing means is the task of the least squares. The challenge here is to minimize the residual sum of squares polynomial smoothing and density fl ratings. Functional to be minimized is recorded at the same time as

n 2 J (a)   aT (xl )  fl  (12)

l1

In order to smooth the data, as in the interpolation, you can use the methods of the local approximation, generalizing them in relation to the desired, in particular, a smooth interface polynomials defined on a sequence of intervals and delivering the minimum values of functionals of the form (12) under the constraints set by the terms of pairing.

Histogram methods [1, 6] of estimation of the probability density, especially by interpolation, the problem inherent in the partition of the set of values of the random variable into intervals for small sample sizes. Fig. 2. a, b shows two histograms mixture of normal distributions, the same as in Fig. 1. for a sample of 100 samples. This figure shows that the partition of the set of values of the random variable by 20 intervals (Fig. 2 a) interpolation approach does not restore the true form of distribution and draw the right conclusions. The situation is improved by splitting the plurality of slots 10 (Fig. 2 b). In this case, a graph similar in shape to the true probability density bimodal. The solution to this problem, in principle, feasible in the framework of the adaptive partition of the set of values of the random variable in the interval, not necessarily of the same length. Optimal partition is in this case, by varying the lengths and intervals of the centers and of the results of comparison, possibly followed by averaging them.

At the local, including generalized local approximation, partition problem is less acute, and is connected, on the contrary, ensuring sufficient to smooth the number of intervals. However, there is a new question - the question selection algorithm that ensures optimal degree of smoothing empirical estimates (9). Resolution of this issue in principle, feasible methods based on the variation of the free parameters of the algorithm and then selecting the best according to some evaluation criteria.

Another problem for interpolation and approximation of methods for smoothing is a problem of assessment fˆ ( x) belonging to the class of probability density functions. These conditions within the local approximation can be taken into account by introducing into the problem of minimizing the functional (12), corresponding limitations and within the interpolation approach - by varying the lengths and intervals of the partition centers.

Finding the coefficients of the polynomial ( 6 ) optimization methods is the task of the linear regression. In practice, however, these polynomials are often built on the systems standard probability densities nonlinearly depend on a certain set of parameT ters. In this case, if the input vector of parameters    1, , r  , it is possible to m polynomial P(x, a, )   akk (x, )  aT (x, ) , k 1 where determine

the  (x, )   1(x, ),

T , m ( x, ) .

Accordingly, the estimate of density can be written as

m fˆ (x)   aˆkk (x,ˆ)  aˆT (x,ˆ)

k 1 where the evaluation parameters are the solution to the minimization problem n 2  aˆ,ˆ  arg min  aT (xl , )  fl  .

a, l1

Finding the vector of parameters a and  and in this case refers to a class of nonlinear problems, which are usually solved by constrained optimization methods. 5

Conclusion

This paper is a study of direct and indirect estimating the density methods of acoustic signals and the probability of interference occurring in the information and control telecommunications systems. Investigated models of nuclear projection probability density estimate that are based on probability density signals approximation in the case of unimodal and multimodal distributions. Applying method of histogram mixture normal distributions estimation shows that the true form of distributions in the partition of values set of a random variable on a different number of slots is not always possible to restore. This solution is provided by an adaptive optimal partition by varying the lengths and intervals of partition centers.

1. Kropotov

Y.A.

, Paramonov

A.A.

Methods of designing information processing telecommunications systems sharing audio algorithms: monograph .-Moscow-Berlin: Direct Media, 2015 . 226 p (in Russian).

2. Kropotov

Y.A.

The time interval determine the probability distribution of the amplitude of the speech signal law

Radiotekhnika , 2006 . № 6. pp. 97 - 98 (in Russian).

3. Ermolaev

V.A.

, Kropotov

Y.A.

About correlation estimating model parameters of acoustic echo . Questions electronics , Vol. 1 . №1. pp. 46 - 50 (in Russian).

4. Kropotov

Y. A.

, Bykov

A.A.

Algorithm acoustic noise suppression and interference with concentrated formant distribution rejection bands . Questions electronics . 2010 . Vol. 1 . № 1. pp. 60 - 65 (in Russian).

5. Kropotov

Y.A.

, Bykov A.A. Approximation of law probability distribution of acoustic noise signal samples . Radio engineering and telecommunication systems . 2011. № 2 . pp. 61 - 67 (in Russian).

6. Ermolaev

V.A.

, Eremenko

V.T.

, Karasev

O.E.

, Kropotov

Y.A.

Identification of model of discrete linear systems with variable, slowly varying parameters . Radio Engineering and Electronics , 2010 . Vol. 55 . №1. pp. 57 - 62 (in Russian).

7. Ermolaev

V. A.

, Karasev

O.E.

, Kropotov

Y.A.

Interpolation method filtration in problems of speech signal processing in the time domain/ / Journal of Computer and Information Technology, 2008 .- № 7.- pp. 12 - 17 (in Russian).