=Paper= {{Paper |id=None |storemode=property |title=Rough Set Flow Graphs and Ant Based Clustering in Classification of Disturbed Periodic Biosignals |pdfUrl=https://ceur-ws.org/Vol-928/0269.pdf |volume=Vol-928 |dblpUrl=https://dblp.org/rec/conf/csp/PancerzLTW12 }} ==Rough Set Flow Graphs and Ant Based Clustering in Classification of Disturbed Periodic Biosignals== https://ceur-ws.org/Vol-928/0269.pdf
        Rough Set Flow Graphs and Ant Based
        Clustering in Classification of Disturbed
                  Periodic Biosignals

    Krzysztof Pancerz1 , Arkadiusz Lewicki1 , Ryszard Tadeusiewicz2 , and Jan
                                    Warchol3
              1
                  University of Information Technology and Management
                      Sucharskiego Str. 2, 35-225 Rzeszów, Poland
                        {kpancerz,alewicki}@wsiz.rzeszow.pl
                     2
                        AGH University of Science and Technology
                      Mickiewicza Av. 30, 30-059 Krakow, Poland
                                    rtad@agh.edu.pl
                             3
                                Medical University of Lublin
                       Jaczewskiego Str. 4, 20-090 Lublin, Poland
                                 jan.warchol@umlub.pl



       Abstract. In the paper, we are interested in classification of disturbed
       periodic biosignals. An ant based clustering algorithm is used to group
       episodes into which examined biosignals are divided. Disturbances in
       periodicity of such signals cause some difficulties in formation of coherent
       clusters of similar episodes. A quality of a clustering process result can
       be used as an indicator of disturbances. A local function in the applied
       clustering algorithm is calculated on the basis of temporal rough set flow
       graphs representing an information flow distribution for episodes.

       Keywords: rough set flow graphs, ant based clustering, biosignals


1    Introduction
An increasing interest is now evident in classification and clustering for time
series and signals using a variety of methodologies (cf. [5]). One of the frequently
examined problems concerns analysis of biosignals (cf. [6], [12]). In our research,
we consider a problem of classification of voice signals in order to detect some
disturbances indicating the possible presence of laryngeal pathologies.
    Periodicity is one of the main features of some biosignals (e.g., voice, ECG).
However, in case of some diseases, this periodicity can be disturbed. Especially,
it is expressed by different shapes in selected time windows corresponding to
periods of biosignals. The main idea of the proposed approach is based on recog-
nition of temporal patterns and their replications in a selected fragment of the
signal being examined. In case of some disturbances, temporal patterns cannot
be found. This problem has been considered by us in examination of a voice
signal for a non-invasive diagnosis of selected larynx diseases. We have used dif-
ferent approaches to solve this problem, for example, recurrent neural networks
[13], [14], [15], mining unique episodes in temporal information systems [10], ant
based clustering with similarity measures [9].
    In this paper, an ant based clustering algorithm working on the basis of tem-
poral rough set flow graphs is proposed. Rough set flow graphs [11] introduced
by Z. Pawlak are a useful tool for the knowledge representation. We use them as
a tool for representing the knowledge of transitions between consecutive samples
of examined signals. A quality of a clustering process result can be used as an
indicator of disturbances in periodicity of biosignals. If episodes included in time
windows corresponding to periods of the examined signal are similar (there is
a lack of non-natural disturbances), then they should be grouped into very co-
hesive clusters. If significant replication disturbances in time appear, then time
windows are grouped in several clusters or they are scattered on the grid with-
out any distinct groups. To provide such a classification ability, we use a special
algorithm for clustering a set of well categorized objects, originally proposed by
us in [8]. A set of well categorized objects is characterized by a high similarity
of objects within classes and a relatively high dissimilarity of objects between
different classes. The algorithm is based on versions of ant based clustering al-
gorithms proposed earlier by Deneubourg, Lumer and Faieta as well as Handl
et al. Temporal rough set flow graphs representing an information flow distribu-
tion for episodes are used in the clustering algorithm for calculation of the local
function.


2   Episode Information Systems and Temporal Rough Set
    Flow Graphs

In this section, we recall the basic concepts concerning information systems and
rough set flow graphs which are crucial to understand the approach proposed in
the paper as well as we introduce a notion of episode information systems and
show how to create rough set flow graphs for such systems.
    An information system is a pair S = (U, A), where U is a set of objects, A is
a set of attributes, i.e., a : U → Va for a ∈ A, where Va is called a value set of a.
Any information system can be represented as a data table, whose columns are
labeled with attributes, rows are labeled with objects, and entries of the table
are attribute values.
    In our approach, we will use information systems to represent time series
data. Biosignals recorded in electronic devices have a digital form and they can
be treated as time series, i.e., sequences of signal samples. Each consecutive
sample represents a value of the signal at a given time instant. Therefore, we
assume that a set of attributes in an information system is ordered in time, i.e.,
A = {at : t = 1, 2, . . . , n}, where at is the attribute determining values of the
signal at time instant t. Each object in such a system is said to be an episode
and the whole system will be called an episode information system and denoted
by Se . An example of the episode information system Se = (E, A), where E is
a nonempty finite set of episodes and A is a nonempty finite set of attributes
ordered in time, is shown in Figure 1. This system includes five episodes of real
voice signals and each episode consists of five consecutive samples.


             Table 1. An example of the episode information system Se

               E/A    a1        a2        a3        a4        a5
                e1 -9362.00 -10168.00 -11222.00 -11749.00 -12400.00
                e2 -9734.00 -10261.00 -11346.00 -12090.00 -12524.00
                e3 -10261.00 -11160.00 -11656.00 -12524.00 -13268.00
                e4 -9672.00 -10664.00 -11191.00 -11718.00 -12400.00
                e5 -8990.00 -9517.00 -10013.00 -10664.00 -10912.00



    Rough set flow graphs have been defined by Z. Pawlak [11] as a tool for rea-
soning from data. A flow graph is a directed, acyclic, finite graph G = (N, B, σ),
where N is a set of nodes, B ⊆ N × N is a set of directed branches and
σ : B → [0, 1] is a flow function.
    An input of a node x ∈ N is the set I(x) = {y ∈ N : (y, x) ∈ B}, whereas
an output of a node x ∈ N is the set O(x) = {y ∈ N : (x, y) ∈ B}. σ(x, y) is
called a strength of a branch (x, y) ∈ B. We define the input and the output of
the graph G as I(G) = {x ∈ N : I(x) = ∅} and O(G) = {x ∈ N : O(x) = ∅},
respectively. I(G) and O(G) consist of external nodes of G. The remaining nodes
of G are its internal nodes.
    With each node x ∈ N we associate its inflow δ+ (x) and outflow δ− (x)
defined by:
                P
  – δ+ (x) =         σ(y, x),
              y∈I(x)
                P
  – δ− (x) =         σ(x, y).
              y∈O(x)

For each node x ∈ N , its throughflow δ(x) is defined as follows:
                             
                              δ− (x)          if x ∈ I(G),
                    δ(x) = δ+ (x)              if x ∈ O(G),                          (1)
                               δ− (x) = δ+ (x) otherwise.
                             

   With each branch (x, y) ∈ B we may also associate its certainty cer(x, y) =
σ(x,y)
 δ(x) , where δ(x) 6= 0.
   A directed path [x . . . y] between nodes x and y in G, where x 6= y, is a
sequence of nodes x1 , x2 , . . . , xn such that x1 = x, xn = y and (xi , xi+1 ) ∈ B,
where 1 ≤ i ≤ n − 1. For each path [x1 . . . xn ], we define its certainty as
                    n−1
                     Q
cer[x1 . . . xn ] =     cer(xi , xi+1 ). In our approach, we will use different operators
                 i=1
(not only product) for calculating a certainty of a given path.
   To represent an information flow distribution in an episode information sys-
tem we can use rough set flow graphs. A very similar problem has been considered
 Algorithm 1: Algorithm for creating a temporal rough set flow graph corre-
 sponding to an episode information system
   Input : An episode information system Se = (E, A), where
             A = {at : t = 1, 2, . . . , n}.
   Output: A temporal rough set flow graph G = (N, B, σ, cer) corresponding to
             Se .
   N ←− ∅;
   B ←− ∅;
   for each i = 1, 2, . . . , n do
       Ni ←− ∅;
       for each attribute value v of ai do
           Create a node nv representing v and add it to Ni ;
       end
       N ←− N ∪ Ni ;
   end
   for each i = 1, 2, . . . , n − 1 do
       for each node nv ∈ Ni do
           for each node nw ∈ Ni+1 do
               Create a branch b = (nv , nw );
                             card(E(ai ,v)∩E(ai+1 ,w))
               σ(b) ←−               card(E)
                                                       ;
                         card(E(a ,v)∩E(a
                             i        i+1   ,w))
            cer(b) ←−     card(E(ai ,v))
                                                   ;
            B ←− B ∪ {b};
         end
      end
   end




in [4]. The algorithm presented in this section is modeled on that presented in
[4]. Let Se = (E, A), where A = {at : t = 1, 2, . . . , n}, be an episode informa-
tion system. A rough set flow graph G corresponding to Se consists of n layers.
Nodes in the i-th layer of G represent values determined by the attribute ai ,
where i = 1, 2, . . . , n. Since the attribute set A is ordered in time, we can call
the graph G as a temporal rough set flow graph. It represents temporal flow
distribution in an episode information system. In order to construct a temporal
rough set flow graph G corresponding to an episode information system Se we
may perform Algorithm 1. A graph constructed using this algorithm is supple-
mented with a certainty function which assigns a number called certainty from
the interval [0, 1] to each branch in G. Hence, we have G = (N, B, σ, cer), where
N is a set of nodes, B is a set of directed branches, σ : B → [0, 1] is a flow func-
tion, and cer : B → [0, 1] is a certainty function. Certainty factors of branches
will be used in our approach presented here.

    Let Se = (E, A), where A = {at : t = 1, 2, . . . , n} be an episode information
system. By E(ai , v) we denote the set of all episodes in E for which the attribute
ai has the value v.
3    Ant Based Clustering Algorithm

Ant based clustering of time series was considered by us in [7]. Next, the approach
presented there has been modified in our last work [8]. Here, we briefly remind
this approach. It is based on algorithms proposed earlier by Deneubourg [1],
Lumer and Faieta [3] as well as Handl et al. [2] with some our modifications. A
set of steps is formally listed in Algorithm 2. In this algorithm:

 – E is a set of episodes being clustered (the size of E is n),
 – N is a number of iterations performed for the clustering process,
 – Ants is a set of ants used in the clustering process,
 – ppick (e) and pdrop (e) are probabilities of the picking and dropping operations
   made for a given episode e, respectively (see Formulas 3 and 4).

     Let P be a set of all places of the square grid G on which objects are scattered.
Each place p of G is described by two coordinates, i and j, written as p(i, j). The
neighborhood π(p) of p(i, j), where a given object is foreseen to be dropped or
whence it is foreseen to be picked up, is defined as a square surrounding p(i, j),
i.e.:
          π(p) = {p0 (i0 , j 0 ) ∈ P : abs(i0 − i) <= r and abs(j 0 − j) <= r},
where r is a radius of perception of ants and abs denotes the absolute value.
    For the neighborhood π(p), a local episode information system Se (p) is cre-
ated. Se (p) includes all episodes placed in π(p). Next, on the basis of Se (p), a
temporal rough set flow graph T RSF G(p) is built using Algorithm 1. T RSF G(p)
serves as a base for calculating a local function value.
    Let Se (p) = (E, A), where A = {at : t = 1, 2, . . . , n}, be a local episode
information system and e∗ be the episode designated to pick up from or to drop
at the place p. If e∗ is described by the following attribute values: a1 (e∗ ) = v1 ,
a2 (e∗ ) = v2 , ..., an (e∗ ) = vn , then a local function floc (e∗ ) is calculated as:
                                       n−1
                           floc (e∗ ) = Op cer(nivi , ni+1
                                                       vi+1 ),                     (2)
                                       i=1

where:

 – Op is an aggregation operator, for example, median, arithmetic average,
   average weighted with different ways, etc.,
 – nivi is the node in the i-th layer of T RSF G(p) corresponding to value vi ,
 – ni+1
     vi+1 is the node in the (i + 1)-th layer of T RSF G(p) corresponding to value
   vi+1 .

It is easy to see that a local function is calculated as the aggregation of certainty
factors of branches belonging to the path in T RSF G(p) determined by attribute
values of the episode e∗ .
    To avoid forming smaller clusters of very similar episodes, the threshold den-
sity minDens is used. The density dens(e) of the neighborhood π(p) for the
episode e designated to pick up from or to drop at the place p is calculated as a
ratio of a number of all episodes placed in the neighborhood π(p) to a number
of all places in the neighborhood π(p).
    Picking and dropping decisions for the episode e can be formally expressed
by the following threshold formulas:
                         (
                           1                          if floc (e) <= ϑpick
                                                                      sim ,
             ppick (e) =       1                    2                       (3)
                               pick 2 (floc (e) − 1) otherwise
                           (1−ϑ    )
                                  sim


and                           (
                                  1                 if floc (e) >= ϑdrop
                                                                    sim ,
                pdrop (e) =            1         2                            (4)
                                    drop 2 floc (e) otherwise.
                                  (ϑ    )
                                      sim


   Thresholds ϑpick        drop
                 sim and ϑsim for picking and dropping operations, respectively,
are fixed. These values are selected experimentally for given sets of signals.


4     Classification Procedure

It was mentioned in the introduction that we are interested in periodical biosig-
nals. The classification procedure proposed in this paper is as follows. We divide
a given fragment of a periodic biosignal into time windows. Each time window
represents one period of the signal. A signal included in one time window is
called an episode. The set of episodes is used in the clustering process.
    In the experiments, voice signals collected by J. Warchol [16] were examined.
Examples of two voice signals, divided into episodes are shown in Figures 1
and 2, respectively. The first signal does not include disturbances of periodicity
whereas the second one includes them. For comparison, we also present a pure
sine signal (see 3) divided into episodes.
    Before clustering, we apply some pre-processing procedures:

1. Values of signal samples are normalized to the interval [−1.0, 1.0].
2. Each episode is transformed into the so-called delta representation, i.e., val-
   ues of samples have been replaced with differences between values of current
   samples and values of previous samples. After transformation, each episode
   is a sequence consisting of three values: -1 (denoting decreasing), 0 (denot-
   ing a lack of change), 1 (denoting increasing) (cf. [7]). This transformation
   enables us to obtain a multistage decision transition system with discrete
   values.

   Experiments for the ant based clustering have been performed with the fol-
lowing parameters:

 – grid size: 100 × 100,
 – no. of ants: 5,
 – radius of perception: 2,
 – no. of iterations: 10000,
 – aggregation operator : arithmetic average.
Algorithm 2: Algorithm for Ant Based Clustering of Episodes
 for each episode ei ∈ E do
     Place ei randomly on a grid G;
     Set ei as dropped;
 end
 for each ant aj ∈ Ants do
     Place aj randomly on a grid place occupied by one of episodes from E;
     Set aj as unladen;
     workT ime(aj ) ← 0;
 end
 for k ← 1 to N do
     for each ant aj ∈ Ants do
         if aj is unladen then
             if place of aj is occupied by episode e then
                  Draw a random real number r ∈ [0, 1];
                  if dens(e) < minDens or r ≤ ppick (e) then
                      set e as picked;
                      set aj as carrying the episode;
                  else
                      move aj randomly to another place occupied by one of
                      episodes from E;
                  end
             else
                  move aj randomly to another place occupied by one of episodes
                  from E;
             end
         else
             Draw a random real number r ∈ [0, 1];
             if r ≤ pdrop (e) then
                  move e carried by aj randomly to a new place on a grid;
                  set e as dropped;
                  set aj as unladen;
                  workT ime(aj ) ← 0;
             else
                  workT ime(aj ) ← workT ime(aj ) + 1;
             end
         end
         if workT ime(aj ) > maxW orkT ime then
             set e carried by aj as dropped;
             set aj as unladen;
             move aj randomly to another place occupied by one of episodes
             from E;
             workT ime(aj ) ← 0;
         end
     end
     increase minDens;
 end
     Fig. 1. Selected episodes of the signal without disturbances of periodicity




       Fig. 2. Selected episodes of the signal with disturbances of periodicity




After a clustering process, we expect to have two distinguishable situations. If
episodes are similar (there are no disturbances), then they are grouped into very
cohesive clusters. If significant replication disturbances appear, then episodes are
grouped in several dispersed clusters or they are scattered on the grid. A result
of clustering is an indicator used to classify the examined biosignal.
    Exemplary results of ant based clustering, for the signals without and with
disturbances of periodicity, are shown in Figures 4 and 5, respectively. For com-
                   Fig. 3. Selected episodes of the pure sin signal



parison, we also present a result of ant based clustering for a pure sine signal
(see Figure 6).




    Fig. 4. The result of clustering the signal without disturbances of periodicity




5   Conclusions
In the paper, the first attempt to application of ant based clustering based on
temporal rough set flow graphs for classification of disturbed periodic biosignals
     Fig. 5. The result of clustering the signal with disturbances of periodicity




                 Fig. 6. The result of clustering the pure sin signal



has been presented. At the beginning, we were interested in simple classification,
i.e., periodicity of the examined signal is disturbed or no. In the future, we plan
to make more detailed analysis of results of a clustering process. This should
give us additional information about the scale and character of disturbances.


Acknowledgments

This paper has been partially supported by the grant No. N N519 654540 from
the National Science Centre in Poland.


References
 1. Deneubourg, J., Goss, S., Franks, N., Sendova-Franks, A., Detrain, C., Chrétien,
    L.: The dynamics of collective sorting: Robot-like ants and ant-like robots. In:
    Proceedings of the First International Conference on Simulation of Adaptive Be-
    haviour: From Animals to Animats 1. pp. 356–365. MIT Press, Cambridge, MA
    (1991)
 2. Handl, J., Knowles, J., Dorigo, M.: Ant-based clustering and topographic mapping.
    Artificial Life 12(1), 35–62 (2006)
 3. Lumer, E., Faieta, B.: Diversity and adaptation in populations of clustering ants.
    In: Proceedings of the Third International Conference on Simulation of Adaptive
    Behaviour: From Animals to Animats 3. pp. 501–508. MIT Press, Cambridge, MA
    (1994)
 4. Matusiewicz, Z., Pancerz, K.: Rough set flow graphs and max - * fuzzy relation
    equations in state prediction problems. In: Chan, C.C., Grzymala-Busse, J.W.,
    Ziarko, W. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes
    in Computer Science, vol. 5306, pp. 359–368. Springer-Verlag, Berlin Heidelberg
    (2008)
 5. Mitsa, T.: Temporal Data Mining. CRC Press (2010)
 6. Nait-Ali, A. (ed.): Advanced Biosignal Processing. Springer-Verlag, Berlin Heidel-
    berg (2009)
 7. Pancerz, K., Lewicki, A., Tadeusiewicz, R.: Ant based clustering of time series
    discrete data - a rough set approach. In: Panigrahi, B.K., et al. (eds.) Swarm,
    Evolutionary, and Memetic Computing, Lecture Notes in Computer Science, vol.
    7076, pp. 645–653. Springer-Verlag, Berlin Heidelberg (2011)
 8. Pancerz, K., Lewicki, A., Tadeusiewicz, R.: Ant based clustering of two-class sets
    with well categorized objects. In: Greco, S., Bouchon-Meunier, B., Coletti, G.,
    Fedrizzi, M., Matarazzo, B., Yager, R. (eds.) Advances in Computational Intelli-
    gence, Communications in Computer and Information Science, vol. 299, pp. 241–
    250. Springer-Verlag, Berlin Heidelberg (2012)
 9. Pancerz, K., Lewicki, A., Tadeusiewicz, R., Szkola, J.: Classification of speech
    signals through ant based clustering of time series. In: Computational Collective
    Intelligence Technologies and Applications. Lecture Notes in Artificial Intelligence,
    Springer-Verlag, Berlin Heidelberg (2012), to appear
10. Pancerz, K., Paja, W., Wrzesień, M., Warchol, J.: Classification of voice signals
    through mining unique episodes in temporal information systems: A rough set
    approach. In: Proceedings of the 21th international Workshop on Concurrency,
    Specification and Programming (CS&P 2012). Berlin, Germany (2012), to appear
11. Pawlak, Z.: Flow graphs and data mining. In: Peters, J., Skowron, A. (eds.) Trans-
    actions on Rough Sets III, pp. 1–36. Springer-Verlag, Berlin Heidelberg (2005)
12. Semmlow, J.: Biosignal and Medical Image Processing. CRC Press (2009)
13. Szkola, J., Pancerz, K., Warchol, J.: Computer-based clinical decision support for
    laryngopathies using recurrent neural networks. In: Hassanien, A., et al. (eds.)
    Proc. of the ISDA’2010. pp. 627–632. Cairo, Egypt (2010)
14. Szkola, J., Pancerz, K., Warchol, J.: Computer diagnosis of laryngopathies based
    on temporal pattern recognition in speech signal. Bio-Algorithms and Med-Systems
    6(12), 75–80 (2010)
15. Szkola, J., Pancerz, K., Warchol, J.: Recurrent neural networks in computer-based
    clinical decision support for laryngopathies: An experimental study. Computational
    Intelligence and Neuroscience 2011 (2011), article ID 289398
16. Warchol, J.: Speech Examination with Correct and Pathological Phonation Using
    the SVAN 912AE Analyser (in Polish). Ph.D. thesis, Medical University of Lublin
    (2006)