Introduction

Piecewise Continuous Segmentation of Multidimensional Experimental Signals

0 Alexander G. Dmitriev Cherepovets Higher Military Engineering Schoolof Radio Electronics Cherepovets , Russia

41 45

An algorithm for piecewise continuous approximation of structural experimental multidimensional signals with a previously unknown number of intervals for splitting signals into "similar" fragments is proposed. The construction of a multidimensional piecewise continuous approximating function is performed "left - to - right", which allows to use the dynamic programming method to determine the boundaries of the partition intervals. The approximation quality criterion is used, taking into account the number of data on the partition intervals and the "complexity" of the local signal models used.

Introduction

In various applications, the problem arises of analyzing the so-called structural experimental signals, considered as a time-sequential combination of simpler signals (functions), that have constant properties at the corresponding time intervals [Mot79, Kos04]. Processing of such signals in most cases is reduced to a two-stage procedure: the allocation of "similar" fragments (segmentation stage) and the subsequent construction of the description of the presented signals as a whole. The use of existing methods of signal segmentation proves to be insufficiently effective in studies under conditions of high dimensionality, limited experimental observations and an unknown number of "similar" sites (partition intervals). In addition, the approximating function usually suffers a gap at the boundaries of "similar" sites [Dmi10, Dor84].

The aim of the work is to develop an algorithm for piecewise continuous approximation of multidimensional signals with a previously unknown number of intervals of splitting signals into "similar" fragments, which under certain conditions delivers the optimal value to the selected criterion of approximation quality.

1 Problem Statement

Let a set of signals (a multidimensional signal) y(t) = ( y(1) (t),..., y(s) (t)) collectively characterizing the object under study, be presented for analysis. The values y(i) (t), i = 1,..., s are specified at discrete points in time t = t1,..., tN . The criterion of approximation quality J on the sample of experimental values is chosen as:

s r J = å µi å i=1 j=1 n j n j -m [y(i) (tl ) - Fj(i) (tl , a(ji) )]2 , (1) where µi - a priori "weight" of the signal y(i) , n j the number of discrete samples on the interval m (Tj-1,Tj ], Fj(i) (tl , a(ji) ) = å a(jik)jk (t) - polynomial over k =1 a given set of basis functions {j k (t), k = 1,..., m}, a (ji) = (a (j1i) ,...,a (jmi) ) - the vector of estimated parameters in j-th interval.

Criterion (1) uses weights µi (i = 1,..., s) and n j / (n j - m) ( j = 1,..., r) . The introduction of weights µi is due to the fact that in practice the parameters often have different practical significance. Specific values µi are chosen for meaningful reasons, usually they are s normalized so that å µi = 1 . Weights i=1 n j / (n j - m) are the usual normalizing coefficients that take into account the dimension of the model. It is required to find such a partition T = (T0 ,T1,...,Tr ), T0 < T1 < ... < Tr (synchronous change of signals y(i) is assumed) of a given interval [t1, tN ], T0 = t1, Tr = tN into r intervals (Tj-1,Tj ], j = 1,..., r ( r - in general, unknown) and to determine on each of these intervals such values of the parameter vectors a (ji) , i = 1,..., s, that the functional (1) takes a minimum value under the condition of restrictions on the continuity of approximating functions:

Fj(i) (Tj ,a (ji) ) = Fj(+i1) (Tj ,a (j+i)1 ), ( j = 1,..., r); (i = 1,..., s).

2 Algorithm

Denote by-the e (i) (Tj-1,Tj ) error of approximation of the i-th signal on the interval (Tj-1,Tj ] , calculated by the method of least squares. Then criterion (1) will take the form:

s r n J = åµi å j e (i) (Tj-1,Tj )

i=1 j=1 nj -m Let's make the following assumption. Borders Tj ( j = 1,..., r -1), partitions

T = (T0 ,T1,...,Tr ) and corresponding local approximating functions on each interval of partition will be found from "left to right". Let the position of the boundary T1 be determined, find the local approximating functions F (i) (t, a1(i) ), 1 delivering the minimum e (i) (T1,T2 ), i = 1,..., s . Next, we fix the position of the boundary T1, find the boundary T2 and the corresponding local approximating functions F2(i) (t, a(2i) ), delivering a minimum of e (i) (T1,T2 ) (i = 1,..., s) with a restriction on continuity at the boundary T1 : F1(i) (T1, a1(i) ) = F2(i) (T1, a(2i) ) , i = 1,..., s . (2) (3)

Fix T2 , etc.

Under this assumption criterion (3) with constraints (2) has the following property. Let be Tj* some fixed position of the right boundary of the j-th interval.

Then the bounds T1*,...,Tj*-1, obtained by minimizing (3) only over the bounds T1,...,Tj-1 are independent of the boundary values Tj+1,...,Tr-1. Indeed, the functional (3) can be represented as the sum of two nonnegative quantities: J = Ja + Jb , where Ja = Ja (T1,...Tj-1 / Tj*) = íìïåj nk åSµie (i) (Tk -1,Tk )ýïïü, ïîk =1 nk -m i=1 þ Tj = Tj*, Jb = Jb (Tj+1,...Tr-1 / Tj*) = ìïí år nk åSµie (i) (Tk-1,Tk )üïïý, ïîk= j+1 nk -m i=1 þ Tj = Tj*.

But then, obviously, arg min J = arg min Ja .

T1,...,Tj-1 T1,...,Tj-1 It follows from this property that if Tj* is the optimal position of the j-th boundary, then the boundaries T1*,...,Tj*-1, obtained by minimizing J by T1,...,Tj-1, are also optimal. The considered property of criterion (3) makes it possible to use the dynamic programming procedure [Bel69] to determine the optimal boundaries of intervals.

Let the number of intervals be equal to r0 . The following recursive algorithm finds the partition and local approximating functions, delivering the optimal value to the functional (1) (under the above assumption).

First the functions

J j (Tj ) ( j = 1, 2,..., r0 -1) are tabulated sequentially, where J1(T1) = J j (Tj ) = n n-1 m åi=S1 µie (i) (T0 ,T1 ), T1 = tp ,...,tN -(r0 -1) p ; 1 ì ï min íJ j-1 (Tj-1 ) + Tj-1=t( j-1) p , ïî ...,tTj¢ - p njn-j m åi=S1 µie (i) (Tj-1,Tj )ïþýïü, T j¢ - the reference number corresponding to the boundary T j , p is

the specified minimum allowed number of samples on the partition interval. Approximating functions on adjacent intervals are constructed taking into account the conditions for continuity (2).

At the same time, the values M j-1 (Tj ), Tj = t j× p ,..., tN -(r0 - j) p , j = 2,..., r0 -1, the values of the optimal positions of the boundaries Tj -1 for each T j , - are stored. Next, the optimal boundaries of the intervals are determined:

* Tr0 -1 = ìïïJr0 -1 (Tr0 -1 ) + ü ï ï Tr0 -1=t(arr0g-2m)× ipn,...,tN - p íï+ nr0 åsµie (i) (Tr0 -1,tN )ïý, (4) ïî nr0 - m i=1 þï Tr0*-2 = M r0 -2 (Tr0*-1 ) ,..., T1* = M1 (T2* ).

The extreme nature of the dependence J on r is used to find the partition T H = (T0 ,T1H ,...,TrHH-1 ,Tr ) at an unknown number of intervals. Indeed, with an increase r , on the one hand, there is an increase in weights n j / (n j - m +1) due to a decrease in the average

n j , which, other things being equal, leads to an increase in the criterion (1). On the other hand, as the number of intervals increases, the usual quadratic residual decreases, resulting in a decrease in (1). The simultaneous action of these factors leads to the fact that the functional (1) reaches its minimum value at some intermediate (not boundary) value rH .

For determination rH

it is possible to use the approach offered in work [Dmi10]. First, the minimum values of the functional (1) J j (tN ), j = 2,..., rmax are calculated . At the same time, the values M j-1 (Tj ) of the optimal positions of the boundaries Tj -1 for each Tj are stored . Next, the number of intervals is selected and the optimal boundaries are determined. As rH the smallest number of intervals r is chosen, at which Jr (tN ) takes the minimum value.

Optimal bounds T1H ,...,TrHH-1 are determined from expressions analogous to (4): TrHH-1 = M rH -1 (tN ) , TrHH-2 = M rH -2 (TrHH-1 ) , ..., T H = M1 (T2H ).

1 The value rmax is Statistical Modeling chosen for substantive or reasons. In particular, as rmax can be used the value [N / p] , where [ x] is the integer part x .

To check the efficiency of the algorithm, its modeling was carried out on special multidimensional signals. The developed program made it possible to obtain a multidimensional signal with specified properties. The relevant procedure is as follows.

The vector function is considered on the interval: [1, N ] :

y (t ) = ( y(1) (t ) ,..., y(S ) (t )) , where y(i) (t ) = år* e j íìï y(ji) ×Tj-1 - y(ji-)1 ×Tj + ( y(ji-)1 - y(ji) )t ýüï + b ×xt (5) j=1 ïî Tj-1 - Tj ïþ (i = 1,..., s) is a superposition of piecewise linear continuous function and independent Gaussian interference with zero mean and dispersion b2 ; T0 = 1, Tr* = N , Tj , j = 1,..., r* -1 - nodal points of piecewise linear vector function; y(ji) , j = 0,..., r* values of this function in points

Tj ; e j characteristic

function equal to one, if t Î (Tj-1,Tj ùû , and equal to zero, if t Ï (Tj-1,Tj ùû .

The nodal points Tj , the values of piecewise linear functions y(i) at these points , the number of intervals j r* are determined randomly using a random number generator by the following procedure: 1. j = 0; T0 = 1; y0(i) = V 0i , i = 1,..., S. 2. If N - Tj < p , then move to 4. Otherwise - we pass to 3. 2.

V ji , b j j = j +1; Tj = Tj-1 + éëb jc + pùû , ( [ x] is the integer part x ), y(ji) = V ji × d, i = 1,..., s, then move to Tj = N ; r* = j .

- random, independent, uniformly distributed, respectively, at intervals [-1;1] and [0;1] values; p, d , c, N - pre-selected parameters of the algorithm.

Next, the values y(i) (t), (t = 1, 2,..., N), (i = 1,..., s) of the vector function (5) are calculated.

Linear functions were used as local approximating functions:

Fj(i) (t,a (ji) ) = a (j1i) +a (ji2) × t , i = 1,..., s , j = 1,..., r*. In the studies, the experimental material contained three groups of three-component multidimensional signals, obtained using the procedure described above. The first group consisted of multidimensional signals without noise, the second and third, respectively, with average (or low) (dispersion b2 Î[0, 01- 0,1]) and increased noise level ( b2 Î[0,1- 0, 3]). As expected, for no-noise the algorithm found the desired number of intervals, partitioning, and approximating functions without error.

Typical dependences of the criterion J on r for the second signal group are shown in figures 1 (the sign in the figures indicates the actual number of intervals).

J ´0,1 For signals with an average noise level, the minimum functional, as a rule, falls on the desired number of intervals.

Typical dependences of the criterion J on r for the third signal group are shown in figures 2. For signals with an increased noise level, the discrepancy between the optimal and the desired number of intervals was more often manifested. For signals with high noise levels were more likely to be a mismatch is found and the specified numbers of intervals.

For both groups of signals, this discrepancy was observed when there were adjacent intervals in the multidimensional signal, at which the difference in the" behavior " of the signal was insignificant, and, as a rule, such intervals contained a small number of samples.

It can also be noted, that for the signals of the second group, the reduction section of the graph of the dependence of J on r jumps into the increase section, and for signals with an increased noise level, such a transition occurs smoothly. The latter can be used to construct optimization procedures that perform a limited search over r.bjhb

Conclusion

Thus, the proposed approach of constructing a piecewise continuous approximating function "left – to - right" allows to apply the dynamic programming method to determine the boundaries of the partition intervals. Using the extreme behavior of the approximation quality criterion, the number of partition intervals is determined.

[Mot79]

V. V.

Mottl ,

I. B.

Muchnik . Hidden Markov models in structural analysis of signals . M.: Fizmatlit , 1999 .

[Dmi10]

A. G.

Dmitriev. The Algorithm of optimal structural approximation of experimental multidimensional signals . Naukoemkie technologii, No 9 : 31 - 35 , 2010 .

[Kos04]

A. A.

Kostin ,

O. V.

Krasotkina ,

M. V.

Markov ,

V. V.

Mottl ,

I. B.

Muchnik . Dynamic programming algorithms for analysis of non-stationary signals . Computational Mathematics and Mathematical Physics , V 44 , No 1: 62 - 77 , 2004 .

[Dor84]

A. A.

Dorofeyuk ,

A. G.

Dmitriev . Methods for piecewise approximation of multidimensional curves. Automatics and telemechanics , No 12 : 101 - 108 , 1984 .

[Bel69]

Bellman ,

Dreyfus . Applied problems of dynamic programming . M: Nauka , 1969 .