=Paper= {{Paper |id=Vol-2962/paper35 |storemode=property |title=2-adic Fuzzy Partitions and Multi-Scale Representation of Time Series |pdfUrl=https://ceur-ws.org/Vol-2962/paper35.pdf |volume=Vol-2962 |authors=Irina Perfilieva,David Adamczyk |dblpUrl=https://dblp.org/rec/conf/itat/PerfilievaA21 }} ==2-adic Fuzzy Partitions and Multi-Scale Representation of Time Series == https://ceur-ws.org/Vol-2962/paper35.pdf
       2-adic Fuzzy Partitions and Multi-Scale Representation of Time Series

                                                     Irina Perfilieva1 David Adamczyk

                                      University of Ostrava, 30.dubna, 22, Czech Republic,
                                       irina.perfilieva,david.adamczyk@osu.cz,
                                WWW home page: https://ifm.osu.cz/irina-perfiljeva/13263/

Abstract: We are focused on a new method of time series                   lect non-traditional kernels derived from the theory of F-
analysis that is based on the extraction of representative                transforms [7]. This allows to simplify the scaling and
keypoints. We use the multi-scale theory based on the use                 selection of key points, as well as reduce their number and
of non-traditional kernels derived from the theory of F-                  enhance robustness. We also propose a new keypoint de-
transforms. The sequence of kernels corresponds to what                   scriptor and test it on matching financial time series with
is called as 2-adic fuzzy partitions. This leads to simpli-               high volatility.
fied algorithms and comparable efficiency in the selection                   The main theoretical result that we arrive at here is that
of keypoints. We reduce the number of representative key-                 the Gaussian kernel as the predominant in the scale-space
points and enhance robustness of their selection. We also                 theory can be replaced with the same success by a special
propose a new keypoint descriptor and test it on matching                 symmetric positive semi-definite kernel with a local sup-
financial time series with high volatility.                               port. In particular, we show that generating function of a
                                                                          triangular-based uniform fuzzy partition of R can be used
                                                                          for determining such kernel. This fact allows us to base
1    Introduction                                                         upon the theory of F-transforms and its ability to extract
                                                                          features (keypoints) with a clear understanding of their se-
A new method of time series analysis based on the selec-                  mantic meaning [8].
tion of representative keypoints with subsequent reverse
reconstruction is proposed and tested. Keypoints serve as
indicators of the area in which a functional object (time se-             2   Briefly about the theory of scale-space
ries, image, etc.) has clearly expressive features compared                   representations
to other nearly flat areas. The way of extracting keypoints
and related features is similar to the processing of data by              We start with a brief overview (see in [4]) of the mentioned
neural networks. Indeed, the latter is focused on stable                  theory because it explains the proposed methods. Perhaps
feature extraction that is invariant with respect to various              the quad-tree methodology is the first type of multi-scale
geometric transformations.                                                representation of image data. It focuses on recursively di-
   In the related domain, image processing, features are                  viding an image into smaller areas controlled by the in-
associated with keypoints and their descriptors. Both are                 tensity range. The low-pass pyramid representation then
used to be identified with local areas of the image that                  facilitated multi-scaling in such a way that the image size
correspond to its content (not related to the background).                decreased exponentially compared to scale level.
Thus, the processing is computationally consuming and                        Koenderink [3] emphasized that scaling up and down
depends on many space-environment conditions: illumi-                     the internal scope of observations and handling image
nation, position, resolution, etc. In [10], it has been shown             structures at all scales (in accordance with the task) con-
that invariant local features (Harris corners and their rota-             tribute to a successful image analysis. The challenge is
tion invariant descriptors) can contribute to solving general             to understand the image at all relevant scales at the same
problems of image recognition. However, the Harris cor-                   time, but not as an unrelated set of derived images at dif-
ner detector is sensitive to changes in image scale. This                 ferent levels of blur.
was eliminated in [6] where the descriptor known in the                      The basic idea (in Lindeberg [4]) how to obtain a multi-
literature as SIFT was proposed. The method of SIFT has                   scale representation of an object is to embed it into a one-
inspired many modifications: SURF, PCA-SIFT, GLOH,                        parameter family of gradually smoothed ones where fine-
Gauss-SIFT, etc., (see [5] and references therein), aimed                 scale details are sequentially suppressed. Under fairly gen-
at improving efficiency in various senses: reliability, com-              eral conditions, the author showed that the Gaussian kernel
putation time, etc. However, the main stages and their se-                and its derivatives are the only possible smoothing kernels.
mantics have been preserved.                                              These conditions are mainly linearity and shift invariance,
   Our contribution to this topic is as follows: we use the               combined with various ways of formalizing the notion that
basic methodology of SIFT and its modifications, but se-                  structures on a coarse scale should correspond to simplifi-
                                                                          cations of corresponding structures on a fine scale.
      Copyright c 2021 for this paper by its authors. Use permitted un-
der Creative Commons License Attribution 4.0 International (CC BY            A scale-space representation differs from a multi-scale
4.0).                                                                     representation in that it uses the same spatial sampling at
all scales and one continuous scale parameter as the gen-        operators to those that take into account the specifics of
erator. By the construction in Witkin [11], a scale-space        spaces with fuzzy partitions. Then, the next goal is to
representation is a one-parameter family of derived sig-         show that the diffusion (heat conduction) equation in (2)
nals constructed using convolution with a one-parameter          can be extended to spaces with closeness, where the con-
family of Gaussian kernels of increasing width.                  cepts of derivatives are adapted to nonlocal cases. Both
   Formally, a scale-space family of a continuous signal is      things allow us to use the theory of scale-space represen-
constructed as follows. For a signal f : RN → R, the scale-      tation and propose on its basis a new method for localizing
space representation L : RN × R+ → R is defined by:              key points.
                                                                    Let us first recall the basic definitions of all related con-
                     L(·, 0) = f (·),                            cepts.
                     L(·,t) =g(·,t) ? f ,                 (1)
                                                                 3.1    Fuzzy partition
where t ∈ R+ is the scale parameter and g : RN × R+ → R
is the Gaussian kernel as follows:                               Definition 1: Fuzzy sets A1 , . . . , An : [a, b] → R, estab-
                                            N                    lish a fuzzy partition of the real interval [a, b] with nodes
                            1                   xi2              a = x1 < . . . < xn = b, if for all k = 1, . . . , n, the following
              g(x,t) =            exp − ∑ .
                         (2πt)N/2       i=1 2t                   conditions are valid (we assume x0 = a, xn+1 = b):
The scale parameter t relates to the standard deviation of         1. Ak (xk ) = 1, Ak (x) > 0 if x ∈ (ak , bk ), a ≤ ak < bk ≤ b;
the kernel g, and is a natural measure of spatial scale at the          Sn
level t.                                                           2.    k=1 (ak , bk ) = (a, b);
   As an important remark, we note that the scale-space
                                                                   3. Ak (x) = 0 if x 6∈ [ak , bk ];
family L can be defined as the solution to the diffusion
(heat) equation                                                    4. Ak (x) is continuous on [ak , bk ].
                              1
                       ∂t L = ∇T ∇L,                       (2)   The membership functions A1 , . . . , An are called basic
                              2
                                                                 functions [7].
with initial condition L(·, 0) = f . The Laplace operator,
∇T ∇ or ∆, the divergence of the gradient, is taken in the       Definition 2: The fuzzy partition A1 , . . . , An , where n ≥
spatial variables.                                               2, is h-uniform if nodes x1 < · · · < xn are h-equidistant,
                                                                 i.e. for all k = 1, . . . , n − 1, xk+1 = xk + h, where h = (b −
   The solution to (2) in one-dimension and in the case
                                                                 a)/(n−1), and there exists an even function A0 : [−1, 1] →
where the spatial domain is R is known as the convolution
                                                                 [0, 1], such that A0 (0) = 1, and for all k = 1, . . . , n:
(?) of f (initial condition) and the fundamental solution:
                                                                                              
                                                                                     x − xk
                 L(·,t) =g(·,t) ? f ,                     (3)         Ak (x) = A0                , x ∈ [xk − H, xk + H] ∩ [a, b],
                                                                                         H
                           1        x2
                g(x,t) = √     exp − .                    (4)
                        ( 2πt)      2t                           where H > h/2.
                                                                 A0 is called a generating function of a uniform fuzzy par-
The following two questions arise: is this approach the          tition [7]. Generating function A0 is normalized if
only reasonable way to perform low-level processing, and                                 Z 1
are Gaussian kernels and their derivatives the only smooth-
                                                                                                A0 (x)dx = 1.
ing kernels that can be used? Many authors [4, 11, 3]                                      −1
answer these questions positively, which leads to the de-
fault choice of Gaussian kernels in most image processing        Remark 1. Rescaled generating function AH (x) =
tasks. In this article, we want to expand on the set of use-     A0 (x/H) generates the corresponding kernel AH (x − y).
ful kernels suitable for performing scale-space representa-
tions. In particular, we propose to use kernels arising from     3.2    Discrete Universe with Closeness
generating functions of fuzzy partitioning.
                                                                 In this section, we introduce a finite space with a binary
                                                                 relation of closeness and then show how closeness can be
3   Space with a Fuzzy Partition as a                            related to a uniform fuzzy partition.
    Universe with Closeness                                         The best formal model of a space with closeness is a
                                                                 weighted graph G = (V, E, w) where V = {v1 , . . . , v` } is a
In this section, we introduce space that plays an important      finite set of vertices, and E (E ⊂ V ×V ) is a set of weighted
role in our research. A space with a fuzzy partition is con-     edges so that w : E → R+ . The edge e = (vi , v j ) connects
sidered as a space with a proximity (closeness) relation,        two vertices vi and v j , and then the weight of e is w(vi , v j )
which is a weak version of a metric space. As we indi-           or just wi j . Weights are set using the function w : V ×
cated at the beginning, our goal is to extend the Laplace        V → R+ , which is symmetric (wi j = w ji , ∀ 1 ≤ i, j ≤ `),
non-negative (wi j ≥ 0) and wi j = 0 if (vi , v j ) 6∈ E. The               for any function H ∈ H(E) and function f ∈ H(V ).
notation vi ∼ v j denotes two adjacent vertices vi and v j                  Proposition 1: The adjoint operator d ∗ can be expressed
with an existing edge connecting them. Function w sets                      at a vertex u ∈ V by the following formula:
the closeness on V .
                                                                                   (d ∗ H)(u) = ∑ w(u, v) (H(v, u) − H(u, v)) . (8)
                                                                                                   p
   There are many models of closeness in the literature
                                                                                                   v∼u
with the default options: Gaussian and uniform distribu-
tions [1]. Below, we propose a new model based on a uni-                    The divergence operator, defined by −d ∗ , measures the
form fuzzy partition.                                                       network outflow of a function in H(E), at each vertex of
   We use the above given (graph) terminology and assume                    the graph.
that the set of vertices V is identified with the set of indices              The weighted gradient operator of f ∈ H(V ), at vertex
V = {1, . . . , `} and that the corresponding interval [1, `] is            u ∈ V, ∀(u, vi ) ∈ E, is a column vector:
1-uniform fuzzy partitioned with normalized basic func-
tions AH               H            H
            1 , . . . A` , so that Ak (x) = AH (x − k), k = 1, . . . , `,
                                                                             ∇w f (u) = (∂v f (u) : v ∼ u)T = (∂v1 f (u), . . . , ∂vk f (u))T .
AH (x) = A0 (x/H), H ≥ 1, and A0 is the generating func-
tion.                                                                         The weighted Laplace operator ∆w : H(V ) → H(V ), is
   Definition 3: Graph GH = (V, E, wH ) is fuzzy weighted,                  defined by:
                                                                                                        1
if V = {1, . . . , `}, and the weight function wH : V ×                                         ∆w f = − d ∗ (d f ).           (9)
                                                                                                        2
V → [0, 1], is determined by a 1-uniform fuzzy partition
AH            H                                                             Proposition 2 [2]: The weighted Laplace operator ∆w at
  1 , . . . A` , of [1, `], where H ≥ 1, so that wH (vi , v j ) =
  H
Ai ( j), i, j = 1, . . . , `.                                               f ∈ H(V ) acts as follows:

                                                                                        (∆w f )(u) = − ∑ w(u, v)( f (v) − f (u)).
                                                                                                            v∼u
4    Discrete (Non-local) Laplace operator
                                                                            This Laplace operator is linear and corresponds to the
In this section, we aim to develop elements of functional                   graph Laplacian.
analysis on spaces with closeness in order to be able to                    Proposition 3 [9]: Let GH = (V, E, wH ) be a fuzzy
introduce an operator with properties similar to the Lapla-                 weighted graph, corresponding to the 1-uniform fuzzy par-
cian. We recall the definition of (non-local) Laplace opera-                tition of V = {1, . . . `}. Then, the weighted Laplace opera-
tor as a differential operator given by the divergence of the               tor ∆H at f ∈ H(V ) acts as follows:
gradient of a function (see [2]). On spaces with closeness,
the generalized version knows as Laplace-Beltrami oper-                         (∆H f )(i) = − ∑ AH                                  H
                                                                                                  i ( j)( f ( j) − f (i)) = f (i) − F [ f ]i ,
                                                                                                i∼ j
ator is used. The definition is based on the self-adjoint
property, which leads us to definitions of the correspond-
                                                                            where F h [ f ]i , i = 1, . . . , `, is the i-th discrete F-transform
ing Hilbert spaces.
                                                                            component of f , cf. [7].
   Let G = (V, E, w) be a weighted graph model of a space
with closeness, and let f : V → R be a real-valued func-
tion. Let H(V ) denote the Hilbert space of real-valued                     5     Multi-scale Representation in a Space
functions on V , such that if f , h ∈ H(V ) and f , h : V → R,                    with a Fuzzy Partition
then the inner product h f , hiH(V ) = ∑v∈V f (v)h(v). Simi-
larly, H(E) denotes the space of real-valued functions de-                  Taking into account the introduced notation, we propose
fined on the set E of edges of a graph G. This space has                    the following scheme for the multi-scale representation
the inner product hF, HiH(E) = ∑(u,v)∈E F(u, v)H(u, v) =                    LFP of a signal f : V → R, where V = {1, . . . , `} and sub-
∑u∈V ∑v∼u F(u, v)H(u, v), where F, H : E → R are two                        script “FP” stands for an 1-uniform fuzzy partition deter-
functions on H(E).                                                          mined by parameter H ∈ N, H ≥ 1:
   The difference operator d : H(V ) → H(E) of f , is de-
fined on (u, v) ∈ E by                                                                                 LFP (·, 0) = f (·),
                                                                                                                       t
                          p                                                                            LFP (·,t) =F 2 H [ f ],              (10)
            (d f )(u, v) = w(u, v) ( f (v) − f (u)) .      (5)
                                                                                                                                 t
                                                                            where t ∈ N is the scale parameter and F 2 H [ f ] is the com-
  The directional derivative of f , at vertex v ∈ V , along
                                                                            plete vector of F-transform components of f . The scale
the edge e = (u, v), is defined as:
                                                                            parameter t relates to the length of the support of the cor-
                       ∂v f (u) = (d f )(u, v).                      (6)    responding basic function. As in the case of (1), it is a
                                                                            natural measure of spatial scale at level t. To show the re-
  The adjoint to the difference operator d ∗ : H(E) →                       lationship to the diffusion equation, we formulate the fol-
H(V ), is a linear operator defined by:                                     lowing general result.
                                                                            Proposition 4: Assume that two time continuously differ-
                  hd f , HiH(E) = h f , d ∗ HiH(V ) ,                (7)    entiated real function f : [a, b] → R, and [a, b] is h-uniform
fuzzy partitioned by AH             H      2H           2H
                       1 , . . . , An and A1 , . . . , An , where   so important in contrast to the comparative analysis of lo-
                  H    2H
basic functions Ai (Ai ), i = 1, . . . , n, are generated by        cal trends and their changes in time intervals with adjacent
A0 (x) = 1−|x| with the nodes at xi = a+ b−a n−1 (i−1). Then,       key points as boundaries.

                                H 2 00                                 Taking into account the above arguments, we propose
               F 2H [ f ]i − F H [ f ]i ≈
                                     f (xi ).        (11)           to localize and identify keypoints from the second-to-last
                                 4
                                                                    scaled representation of the Laplacian before the latter
   The semantic meaning of this proposition in relation
                                                                    meets the stopping criterion. We then follow the technique
to the proposed scheme (10) of multi-scale representation
                                                                    suggested in [6, 5] and identify the keypoint with the lo-
LFP of f is as follows:
                                                                    cal extremum point of the Laplacian corresponding to the
   The F-transfrom (FT)-based Laplacian of f (11) can
                                                                    selected scale. As in the cited above works, we faced a
be approximated by the (weighted) differences of two ad-
                                                                    number of technical problems related to the stability of lo-
jacent convolutions determined by the triangular-shaped
                                                                    cal extrema, sampling frequency in a scale, etc. Due to the
generating function.
                                                                    different spatial organization of the analyzed objects (time
                                                                    series versus images), we found simpler solutions to the
6     Experiments with Time Series                                  problems raised. For example, in order to exclude extrema
                                                                    close to each other (and therefore they are very unstable),
6.1   Reconstruction from FT-based Laplacians                       we leave only one representative, the value of which gives
                                                                    the best semantic correlation with the characteristic of this
To demonstrate the effectiveness of the proposed repre-             particular extremum.
sentation, we first show that an initial time series can be
(with a sufficient precision) reconstructed from a sequence            Below, we give illustrations to some processed by us
of FT-based Laplacians. Below, we illustrate this claim             time series. They were selected from the cite with histor-
on a financial time series with high volatility. With each          ical data in Yahoo Finance. We analyzed the 2016 daily
value of t = 1, 2, . . . we obtain the corresponding FT-based       adjusting closing prices using international stock indices,
Laplacian as the difference between two adjacent convo-             namely Prague (PX), Paris (FCHI), Frankfurt (GDAXI)
lutions (vectors with F-transform components), so that we           and Moscow (MOEX). Due to the daily nature of the time
obtain the sequence                                                 series, they all have high volatility, which is additional
                                                                    support for the proposed method. In Figure 2, we show
          {LFP (·,t + 1) − LFP (·,t) | t = 1, 2, . . .}             the time series with stock indices PX (Prague) and its last
                                                                    three scaled representations of the Laplacian, where the
The stop criterion is closeness to zero of the current differ-      latter satisfies the stopping criterion. Selected (filtered out)
ence. We then compute the reconstruction by summing all             keypoints are marked with red (blue) dots.
the elements in the sequence. Figure 1 shows the step-by-
step reconstruction and the final reconstructed time series.
The latter is plotted on the bottom image along with the
original time series to give confidence in a perfect fit. The
estimated Euclidean distance is 89.6.
                                                                    Keypoint Description. Due to the specificity of time se-
   In the same Figure 1, we show one MLP reconstructions
                                                                    ries with high volatility, we propose a keypoint descriptor
of the same time series with the following configurations:
                                                                    as a vector that includes only the Laplacian values at key-
4 hidden layers with 4086 neurons in each layer (com-
                                                                    points from two adjacent scales and in the area bounded by
mon setting) and learning rate 0.001. It is obvious that
                                                                    an interval with boundaries set by adjacent left/right key-
the proposed multi-scale representation and subsequent re-
                                                                    points from the same scale. In addition, we normalize the
construction are computationally cheaper and give results
                                                                    keypoint descriptor coordinates by the Laplacian value of
with better reconstruction quality. To confirm, we give
                                                                    the principal keypoint. As our experiments with matching
estimates of the Euclidean distances between the original
                                                                    keypoint descriptors of different time series show, the pro-
time series and its reconstructions: (from a sequence of
                                                                    posed keypoint descriptor is robust to noise and invariant
FT-based Laplacians) against 159.3 (using MLP).
                                                                    with respect to spatial shifts and time series ranges. The
                                                                    last remark is that the quality of matching is estimated by
6.2   Keypoint Localization and Description                         the Euclidean distance between keypoint descriptors.
Keypoint localization. The localization accuracy of key               To illustrate the assertion about robustness and invari-
points depends on the problem being solved. When ana-               ance, we show (Figure 3) with the results of matches be-
lyzing time series, the accuracy requirements are different         tween principal keypoints of time series with stock indices
from those used in computer vision to match or register             PX (Prague), FCHI (Paris), and GDAXI (Frankfurt). In all
images. Time series focuses on comparing the target and             cases, the stock indices PX were considered as tested and
reference series in order to detect similarities and use them       compared against the reference stock indices FCHI and
to make a forecast. Therefore, the spatial coordinate is not        GDAXI.
Conclusion
We are focused on a new fast and robust algorithm of im-
age/signal feature extraction in the form of representative
keypoints. We have contributed to this topic by showing
that the use of non-traditional kernels derived from the the-
ory of F-transforms [7] leads to simplified algorithms and
comparable efficiency in the selection of keypoints. More-
over, we reduced their number and enhanced robustness.
This has been shown at the theoretical and experimen-
tal levels. We also proposed a new keypoint descriptor
and tested it on matching financial time series with high
volatility.


Acknowledgment
The work was supported from the project ERDF/ESF
“Centre for the development of Artificial Intelligence
Methods for the Automotive Industry of the region” (No.
CZ.02.1.01/0.0/0.0/17-049/0008414).
  Additional support of the grant project SGS18/PrF-
MF/2021 (University of Ostrava) is kindly announced.


References
[1] Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimension-
    ality reduction and data representation, Neural Computation,
    15(6) (2003) 1373–1396.
[2] Elmoataz, A., Lezoray, O., Bougleux, S.: Discrete regular-
    ization on weighted graphs: A framework for image and
    manifold processing, IEEE Transactions on Image Process-
    ing, 17 (2008) 1047 - 1060.
[3] Koenderink, J. J.: The structure of images, Biological Cy-
    bernetics, 50 (1984) 363 - 370.
[4] Lindeberg, T.: Scale-space theory: a basic tool for analyzing
    structures at different scales, Journal of Applied Statistics,
    21 (1994) 225- - 270.
[5] Lindeberg, T.: Image matching using generalized scale-
    space interest points, Journ. Mathematical Imaging and Vi-
    sion, 52(1) (2015) 3-36.
[6] Lowe, D. G.: Distinctive image features from scale-invariant
    key-points, Int. J. Computer Vision, 60 (2004) 91 - 110.
[7] Perfilieva, I.: Fuzzy transforms: Theory and applications,
    Fuzzy sets and systems 157 (8) (2006) 993 - 1023.
[8] Molek, V., Perfilieva, I.: Deep Learning and Higher Degree
    F-transforms: Interpretable Kernels Before and After Learn-
    ing, Int. Journ. Computational Intelligence Systems, 13(1)
    (2020) 1404-1414.
[9] Perfilieva, I., Vlasanek, P.: Total variation with nonlocal
    FT-Laplacian for patch-based inpainting, Soft Computing 23
    (2019) 1833 - 1841.
[10] Schmid, C., Mohr, R.: Local grayvalue invariants for image
    retrieval, IEEE Trans. Pattern Analysis and Machine Intelli-
    gence, 19(5), (1997) 530-535.
[11] Witkin, A. P.: Scale-space filtering, Proc. 8th Interna-
    tional Joint Conference on Artifcial Intelligence, IJCAI’83,
    2 (1983) 1019 - 1022.
Figure 1: Top. The sequence of reconstruction steps, where with each value t = 1, 2, . . . we improve the quality of the
reconstruction by adding the corresponding Laplacian to the previous one. Middle. The original time series (gray) against
its reconstruction (black) from a sequence of FT-based Laplacians with the distance 89.6. Bottom. The original time series
(gray) against its MLP reconstruction (black) with the distance 159.3.
Figure 2: The time series with stock indices PX (Prague) and its last three scaled representations of the Laplacian, where
the very last bottom graph satisfies the stopping criterion. Selected (filtered out) keypoints are marked with bold dots.
Figure 3: The results of matches between principal keypoints of time series with stock indices PX (Prague), FCHI (Paris),
and GDAXI (Frankfurt). The stock indices PX are considered as tested and compared against the reference stock indices
FCHI top image and GDAXI bottom image.