Fractal Methods in Information Technologies of Processing and Analysis of Parametrically Related Data Streams © Alexei V. Myshev © Mikhail I. Turitsyn National Research Nuclear University MEPhI (IATE), Obninsk, Russia mishev@iate.obninsk.ru Abstract. In this work a new approach to the construction of models and logic circuits of algorithms and procedures for information technology processing and analysis of parametrically related and unrelated data streams within the fractal paradigm is describes. In this case, data streams are defined as information objects whose physical nature can be arbitrary. The information object is investigated apart from any model or scheme, the logical scheme of intellectual technology is built in the form of: facts, regularities and reality. Fractal methods form the framework of the logical, algorithmic and content essence of the approach. The basic premise of the approach is as follows. First, the processing and analysis of parametrically related or unrelated data stream to determine whether it forms a fractal structure and construct a phase portrait of the data stream as an information object. Second, to distinguish the areas of fractal percolation and aggregation in the multifractal structure of the stream, the phase portrait is used. Third, need to estimate the spatial and temporal scales of fractal percolation and aggregation processes. Keywords: data stream, percolation function, fractal dimension, percolating fractal, fractal aggregate. and methods [1,2,3] allows to take into account both the 1 Introduction properties of regularity and irregularity of the structure of the state space of the information stream data scale, In the paper a new approach to the construction of and their dynamic and information. models and logic circuits of algorithms and procedures The paper is structured based on the following of information technology processing and analysis of assumptions. First, identify and show the fractal parametrically related data streams within the fractal properties of information objects. Second, formulate the paradigmis is described. The methodology for task of processing and analyzing parametrically coupled constructing such models and schemes is based on the data flows based on the fractal paradigm. Third, on the construction of the percolation function of a real results show the importance of this work for the parametrically bound data stream and its information development and implementation of IT–technologies for phase portrait. In this case, data streams are defined as large data flows processing and analysis. information objects whose physical nature can be arbitrary. The information object is not investigated 2 Data streams (processing, analysis and within any model or scheme, but the logical scheme of intellectual technology is built in the form of: facts, classification) regularities and reality. Fractal methods form the 2.1 Fractal properties of data stream framework of the logical, algorithmic and content essence of the approach. The main and important point in the formulation of data The main premise of the content-semantic essence of flow fractal properties is the introduction of percolation fractal methods in information technology processing function concept into the designated subject domain. and analysis of large data streams is as follows. The first This concept in this paper is defined as an attribute by stage of data stream processing, is to calculate fractal means of which data stream fractal properties are dimension (geometric and universal) to determine denoted and described. It does not in any way correspond whether it forms a multifractal structure. Second, if the to the known probabilistic definition of the percolation initial data stream is a fractal object, then the stream of function from percolation theory. integer values of the percolation function is formed. To obtain numerical estimates of the geometric Third, construct and analyze a phase portrait of the data measure of the fractal dimension of a parametrically stream; highlight in its structure the areas of fractal related information object as a spatial structure, we used percolation and aggregation; estimate the degree of the well-known Hausdorff–Bezekovich formula [4,5], discrepancy between geometric and information fractal the classical definition of which is as follows. dimensions as an indicator of the unity of quantitative Let the initial data stream form a metric set М, in and qualitative characteristics of the stream. Fourth, which the λ-dimensional outer measure lλ(M) is defined estimate the spatial and temporal scales of fractal as follows. Considered the ρ-covering of the set M, which percolation and aggregation processes. is a countable covering of this set with Si sphere of The fractal paradigm in the methodology of diameter di<ρ, introduce a measure development and implementation of information technologies for processing, analysis and classification 𝑙𝑙𝜆𝜆 (𝑀𝑀, 𝜌𝜌) = 𝑖𝑖𝑖𝑖𝑖𝑖 ∑𝑖𝑖 𝑑𝑑𝑖𝑖𝜆𝜆 , (1) of large data streams, in contrast to traditional methods 68 where the lower face is taken over all ρ–covers of the set phenomena as intermittency in a random medium and M. There is a limit singular errors. This is largely due to the fact that the above approaches to the data streams fractal structure 𝑙𝑙𝜆𝜆 (𝑀𝑀) = 𝑙𝑙𝑙𝑙𝑙𝑙 𝑙𝑙𝜆𝜆 (𝑀𝑀, 𝜌𝜌), (2) analysis do not reflect the information nature of the 𝜌𝜌→0 objects investigated. Based on these assumptions, we have obtained a formula for calculating the estimation of finite or infinite, which, as a function of M, is an external the measure of universal fractal dimension, which is a measure. synergy of geometric and information dimension [7,8]: The Hausdorff dimension dim M of the set M is determined by the behavior of lλ(M) not as a function of ∑𝐾𝐾 𝐾𝐾 𝑖𝑖=1 𝑝𝑝𝑖𝑖 log ∑𝑗𝑗=1�1−𝜌𝜌ij �𝑝𝑝𝑗𝑗 M, but as a function of λ: 𝑑𝑑𝑏𝑏 = lim , (6) 𝜀𝜀→0 log𝜀𝜀 𝑑𝑑𝑑𝑑𝑑𝑑 𝑀𝑀 = 𝑠𝑠𝑠𝑠𝑠𝑠{𝜆𝜆: 𝑙𝑙𝜆𝜆 (𝑀𝑀) = ∞} = 𝑖𝑖𝑖𝑖𝑖𝑖{𝜆𝜆: 𝑙𝑙𝜆𝜆 (𝑀𝑀) = 0}, (3) where pi – the probability of the i–th data stream element that is dimM – is the «transition point»: for λ> dim M, the ri hitting in the i–th subinterval of △= |𝑟𝑟𝑚𝑚𝑚𝑚𝑚𝑚 – 𝑟𝑟𝑚𝑚𝑚𝑚𝑚𝑚 |; ε – value lλ (M) = 0, and for λ < dim M, the value lλ (M) –is length of the subinterval for a given partition of the infinitely large. interval△; ρij – randomized metric between the centers of Unfortunately, the fractals theory mathematical the j–th and i–th subintervals; rmax and rmin – the apparatus based on the fractal dimension of Hausdorff is maximum and minimum values of the stream elements. little applicable to the description of time series and In the work two types of randomized metrics were parametrically related information objects. Therefore, to considered in the work: geometric and informational. To identify patterns due to the properties of the time aspect calculate the geometric metric of the ρij the following and the parametric connection of the world events in the formula was used [7]: information space of the data stream, it is necessary, first of all, to determine the measure of fractal dimension �𝑟𝑟𝑖𝑖 − 𝑟𝑟𝑗𝑗 � 𝜌𝜌𝑖𝑖𝑖𝑖 = |𝑟𝑟| , (7) parametric or temporal structure. For these purposes, the formula for estimating the measure of time structures fractal dimension and Hurst where |𝑟𝑟𝑖𝑖 -𝑟𝑟𝑗𝑗 | – is the geometric distance between the i–th statistics is used. and j–th under the intervals; |r| – is the length of the To estimate parametrically related data stream fractal interval △. To calculate the information metric we used dimension measure the following empirical law takes the ratio below: place [6]: 𝜌𝜌𝑖𝑖𝑖𝑖 = �𝜌𝜌𝑖𝑖 – 𝜌𝜌𝑗𝑗 �. (8) ⟨𝑅𝑅(𝑚𝑚;𝜏𝜏)⟩𝑇𝑇 ⟨𝑅𝑅(1;𝜏𝜏)⟩𝑇𝑇 = 𝑚𝑚𝐹𝐹 , (4) On the one hand, the above–described formulas for obtaining various estimates of measures of fractal where⟨𝑅𝑅(𝑚𝑚; 𝜏𝜏)⟩ 𝑇𝑇 – is a measure of the time or dimensions are integral estimates of fractal properties of parametrically related information structure on the the stream of parametrically coupled data. On the other interval of time parameter change or the connection hand, they allow us to formulate and describe the Tm, ⟨𝑅𝑅(1; 𝜏𝜏)⟩ 𝑇𝑇 – for the interval of length τ, m– integer problem of processing, analysis and classification of number, τ – the duration of the link of the time parametrically linked data stream within the fractal structure, T – the considered period of time. F–fractal paradigm. dimension temporary or parametrically associated structures. 2.2 Problem statement On the other hand, Hurst found that the normalized For processing parametrically connected data streams it scope R / S for time dependences is well described by the is proposed to use mathematical and logical apparatus of empirical relation: fractal theory [5,7,8]. As a criterion of regularity of the data stream, the above-defined quantitative estimates of 𝑅𝑅/𝑆𝑆 = 𝑐𝑐𝑁𝑁 𝐻𝐻 , (5) the geometric and universal fractal dimension are used. The main premise and the meaning of the criterion used whereR – the scope of the change in the values of the data is that the values of the estimates of the fractal dimension stream elements over the entire interval T, S –standard measure reflect the degree of "hole" of the initial data deviation, N – the multiplicity factor of the period Tin stream with respect to the information scale of standard units, H – Hurst index, c – constant. measurement. The Hurst empirical law can be considered as a The primary procedure for processing the original special case of the formula (4) for a parametrically data stream explained by the following logic diagram and related data stream structure. In this case, the following algorithm. analogy is valid: the Hurst exponent can be considered as First, the information scale for measuring the values an analogue of the fractal dimension estimation F for S = of the elements of the original data stream is determined. 1. It should be noted that the fractal dimension F and In the channels of storage and transmission of Hurst index H are not sensitive to such artifacts and information of information–measuring or computing 69 system any data stream is defined as an information processing was developed. The function y = 2.5sin(2πt) object, i.e. binary set. In the language of digital was used as a real harmonic function that induces the technology, this means that the elements of this set can data stream. The volume of data stream was not less than take two values: one or zero. On the information scale ten thousand elements. indicated above, either a numerical or information metric Total scale for the harmonic function varied within (– can be determined. 2.5;2.5).Two different scale division intervals ε=0.01 Secondly, digitization of the information scale and ε=0.001 were taken. The number of partitions L=500 procedure is implemented. The range and price of the and L=5000 respectively. Gyroscope data stream values scale division are digitized by elements of the natural were changed in the range (–840;1270). Scale division series. intervals ε=0.01 and ε=0.001.The number of partitions Third, a stream of integer values of the percolation L=211000 and L=2110000 respectively. function of the original data stream is formed. The As a calculations result, the following fractal percolation function reflects and describes the geometric dimension measure estimates were obtained, which was structure of the "leaky information space" of the original determined by the formula (6) for various metrics ρij, data stream. which were determined by the formulas (7) and (8). Fourth, the procedure of constructing a phase portrait For the harmonic function (figure 1) the following of the data stream is implemented. The phase space is a numerical estimates of universal fractal dimension for plane on which the following coordinate system is various metrics ρij were obtained: determined, namely: the abscissa axis - percolation 1. Geometric metric: function values, the ordinate axis – digitized information db = 0,950 for ε =0,01; scale values. db = 0,827 for ε =0,001; The relationship between the scale integer values and 2. Information metric: another scale of measurement of the source data elements db = 1,060 for ε =0,01; stream is carried out by following attributes: db = 0,837 for ε =0,001. • general scale of variability of the values of the The above measure of universal fractal dimension elements of the original array in any other non- values db show that the harmonic function reflects the integer algebraic system of their measurement; information set, which has a regular structure (the • common scale division price in a non-integer condition of regularity is db→1). Here the geometry of algebraic system; information set and its information compendency are in • number of significant digits. good agreement. This is well confirmed by the fact that The integer values of percolation function are the db values for geometric and information metrics are determined by following ratio: close for different ε values. ℎ𝑖𝑖 = 𝑟𝑟𝑖𝑖 − 𝑟𝑟𝑖𝑖−1 , (9) where hi – values of percolation function, ri – the number of the interval in which the corresponding element of the original data stream falls. The number of partitions L of interval △= |𝑟𝑟𝑚𝑚𝑚𝑚𝑚𝑚 – 𝑟𝑟𝑚𝑚𝑚𝑚𝑚𝑚 | into subintervals is determined based on the following ratio: 𝐿𝐿 =△/𝜀𝜀. (10) The main premise of the problem statement is to determine whether the processed and analyzed data stream is a fractal? If so, describe its fractal structure and calculate the integral estimates of fractal Figure 1 Graph of 2,5sin(2πt) function dimensionmeasures. The solution of problem is to obtain above estimates of fractal dimension measures and Another information picture is observed for the construct the phase portrait. gyroscope values, which are shown in figure 2. The 2.3 Fragments of processing and analysis numerical estimates of the universal fractal dimension for the set of gyroscope data using a geometric metric are Two data sets for processing and analysis using fractal given below: methodswere used. In the first case, data were presented db = 3.438 for ε =0,01; by the stream of regular harmonic function values. In the db = 7.439 for ε =0,001. second case, were used empirical data presented by the The db modulus value with decreasing ε increases icm–20608 quadcopter gyroscope measurements. significantly, which is the criterion and indicator of To solve this problem, a software component that irregularity. The designated information object is a data implements the above algorithm of data stream 70 stream with an irregular structure. A geometric illustration of percolation function for The values of percolation function 2,5sin(2πt) for gyroscope data stream is shown in figure 5. The graph different values of common scale and tick marks are of percolation function fully reflects the properties of shown in figures 3 and 4. As can be seen from figures 3 irregularity and the data stream elements values and 4 at lower values ε percolation function exhibits singularity. pronounced properties of regularity or continuity for This stage of parametrically related stream data 2,5sin(2πt) function calculated values variability. processing and analysis in the information technology logical chain allows us to obtain data stream fractal nature and numerical estimates of fractal geometry measures, as well as to obtain percolation function values stream. The second stage in the logical chain of information technologies for processing and analysis of parametric bound data stream is the construction of its phase portrait. Let's illustrate the results obtained at this stage, using the example of data streams considered earlier. For the data stream induced by the harmonic function 2,5sin(2πt) for ε=0.01 and ε=0.001, phase portraits are shown in figures 6 and 7. Figure 2 Gyroscope data graph Figure 5 Percolation graph for gyroscope data, ε =0,001 Figure 3 Percolation graph for 2,5sin(2πt), ε =0,01 Figure 6 Phase portrait for data stream induced by function 2,5sin(2πt), ε =0,01 Phase portrait shown in figure 6 illustrates following Figure 4 Percolation graph for 2,5sin(2πt), ε =0,001 fractal nature properties of analyzed data stream. First, at a given General scale and the scale division intervals, the 71 distinct fractal properties of stream do not appear, but percolation covers the area of large gyroscope readings. with an increase in the values of ε, it will already have On the one hand, this region of the phase space these properties. Secondly, the presented data stream has reflects the singular processes in the icm–20608 the property of regularity, because the framework of the gyroscope. The dynamics of a quadcopter in this case phase portrait has a closed trajectory. Figure 7 shows a occurs along a complex trajectory, which fits into the phase portrait of the same data stream for a smaller value regular mode of its operation. In Figure 8, this feature is of ε. The geometric image of this portrait illustrates quite reflected in the form of sharp changes in the values of the fully regular properties of data stream in question. The gyroscope readings and is indicated by arrows. framework of the geometric image of the phase portrait On the other hand, it allows to reflect and to describe forms a pronounced closed trajectory. the spatial and temporal scales of the non-standard or non-stationary gyroscope operation mode. The phase portrait shown in Fig. 8 reflects gyroscope stationary operation mode with singular phase transitions and no disturbances. Full-scale experiments on modeling the non-stationary operating mode of the gyroscope were not carried out due to objective reasons for the authors. Laboratory studies of this mode of operation and analysis of the results allow us to draw preliminary conclusions. In this case, the intermittence of the processes of fractal aggregation on fractal percolation trajectories with a positive and negative gradient will be observed. As we can see spatial and temporal scales of the fractal percolation and aggregation processes are sufficiently fully and substantially illustrated by the phase portrait. 3 Conclusions and some generalizations Figure 7 Phase portrait 2,5sin(2πt), ε =0,001 The results of processing and analysis of parametrically related data streams, indicated above, allow us to make a Figure 8 shows a geometric illustration of the phase number of conclusions and generalizations. portrait of the gyroscopedata. First, fractal methods in large parametrically linked data streams processing and analysis is based on logical schemes of their phase portraits cognitive analysis, decoding of information hidden in them are promising and unique paradigm in the information technologies and smart information–measuring systems development. Second, the streams of parametrically related data can be processed using various processes and methods of fractal theory and genetic data for both the collection and population of sample data and their analysis. These methods and processes reflect and define the features of the resulting estimates of fractal measures and dimensions, as well as the scope of the conclusions that can be drawn from these data. In this case, stream phase portrait are parametrically associated to the data stream. Phase portrait of the data stream determines and describes the regular and irregular properties of its structure relative to the information scale of Figure 8 Gyroscope data phase portrait measurements. In a wide aspect of fundamental research in the field The phase portrait shown in figure 8 is quite clearly of intelligent information–measuring systems and and fully illustrates stream multifractal structure in phase intelligent information technologies, the results of this space which presents fractal percolation and aggregation work for the first time allowed us to show how and in region. The space of percolation processes dominates in what the synergy of such entities as facts, laws and reality the region of small values of gyroscope readings (in the is manifested. Is it possible to draw such analogies in the phase portrait – a "clot" of points in the center). framework of traditional models, algorithms, schemes, The geometry and topology of this phase portrait etc.? If yes, then show the results of the identified region reflects a regular structure on the icm–20608 analogies and formulate trends of their theoretical quadcopter gyroscope values set. Such structure is development and practical continuation. typical for a stable and regular operating mode. Fractal Applied aspects of the results are closely related to 72 the solution of a wide range of problems in the field of physical experiment, development and implementation of information technologies for control, diagnosis and control of nuclear power plants, and many others. On the one hand, the methods of fractal theory of solving complex nonlinear problems of processing, analysis and interpretation of the results of physical, biological and medical experiments are proposed. On the other hand, a new IT – technologies was developed and implemented in the trend of DAMDID processing, analysis and classification of parametrically related data. For software implementation of IT – technologies were used data from the icm–20608 quadcopter gyroscope. References [1] Dvoryatkina, S.N.: Integration of fractal and neural network technologies in pedagogical monitoring andassessment of knowledge of trainees. RUDN Journal of Psychology and Pedagogics, 14 (4), pp. 451—465 (2017), DOI: 10.22363/2313-1683-2017- 14-4-451-465 [2] Lymar, T. Yu., Matrova, T.S.,Staroverova, N.Yu.: Fractal search algorithm in relational databases.Bulletin of the South Ural State University Series Computational Mathematics and Software Engineering”, . 3 ( 4), pp. 61–74 (2014), DOI: 10.14529/cmse140404 [3] Zmeskal, O., Vesely, M., Nezadal, M., B.: Fractal analysis of image structures. Harmonic and Fractal Image Analysis. 4, pp. 3–5 (2001) [4] Mandelbrot, B: How Long Is the Coast of Britain? Statistical Self-Similarity and Fractional Dimension.Science, New Series, 156 (3775), pp. 636—638 (1967), DOI:10.1126/science.156.3775.636 [5] Feder, J.: Fractals. Plenum Press, New York, 1988 [6] Eganova, I.A.: The nature of space–time. PublishingHouse of SB RAS, “Geo” Brach, Novosibirsk, Russia (2005) [7] Myshev, A.V.: Metrological theory of the dynamics interacting objects in the information field of a neural network and a neuron. Information Technology, 4, pp. 52–63 (2012) (in Russia) [8] Myshev, A.V., Dunin, A.V.: Fractal Methods in Information Technologies for Processing, Analyzing and Classifying Large Flows of Astronomical Data.CEUR Workshop Proc. 1613. Selected Papers of XIX Inf. Conf. on Data Analytics and Managementin Data Intensive Domains (DAMDID/RCDL 2017).Moscow, Russia, pp. 172– 176. http:// ceur–ws.org/Vol–2022/ 73