=Paper= {{Paper |id=Vol-2893/short_9 |storemode=property |title=Recording and Storage Traffic Management in Storage Systems |pdfUrl=https://ceur-ws.org/Vol-2893/short_9.pdf |volume=Vol-2893 |authors=Tatyana Tatarnikova,Ekaterina Poymanova,Ekaterina Kraeva |dblpUrl=https://dblp.org/rec/conf/micsecs/TatarnikovaPK20 }} ==Recording and Storage Traffic Management in Storage Systems== https://ceur-ws.org/Vol-2893/short_9.pdf
Recording and Storage Traffic Management in Storage Systems
Tatyana Tatarnikovaa, Ekaterina Poymanovaa and Ekaterina Kraevaa
a
    Russian State Hydrometeorological University, ul. Voronezhskaya, 79, 192007 St. Petersburg, Russia
                Abstract
                The article discusses a complex solution for managing traffic recording and storage in data
                storage systems. In the conditions of modern legislation, the issue of storing a large amount of
                data becomes acute. Physical storage management avoids the unnecessary costs of scaling
                storage systems. The article proposes the structure of a hardware and software complex for
                managing physical data storage for storage systems that can be used by owners of technological
                communication networks to store traffic. Control mechanisms are considered, such as the
                distribution of data over various media using Kohonen neural networks and forecasting
                capacity extension using a statistical model and machine learning methods.

                Keywords 1
                traffic, data storage system, data distribution, physical data storage, machine learning, neural
                network, forecasting

1. Introduction
    The requirements of modern legislation in the field of citizen security pose serious challenges to
various organizations, including data storage. The anti-terrorist amendments adopted in 2016 (the so-
called “Yarovaya law”) obliged telecom operators to store traffic metadata for three years, and the
traffic itself for six months. In addition, in June 2020, the Ministry of Digital Development,
Communications and Mass Media of the Russian Federation proposed a bill, according to which the
owners of technological communication networks are required to store traffic for three years [1].
    There is also a legislative norm obligating to increase the capacity of traffic storages by 15%
annually. Even though the government has postponed the introduction of this norm for 1 year, the
problem of using and extending the physical resources of the storage is very acute.
    The volume of Internet traffic over the past 4 years ranged from 32470.782391 PB to 61,226.217838
PB (Fig.1) [2]. That is, in just four years, the volume of traffic has almost doubled. Consequently, the
above norm on the annual increase in storage capacity is insufficient, while its implementation requires
significant financial costs.


                                                          80 000
                                   Internet traffic, PB




                                                          60 000
                                                          40 000
                                                          20 000
                                                              0
                                                               2016       2017                     2018   2019
                                                                                      Year

Figure 1: Increasing in the amount of traffic in 2016-2019


Proceedings of the 12th Majorov International Conference on Software Engineering and Computer Systems, December 10-11, 2020, Online
& Saint Petersburg, Russia
EMAIL: tm-tatarn@yandex.ru; e.d.poymanova@gmail.com; kate.smitt.by@mail.ru
ORCID: 0000-0002-6419-0072; 0000-0001-9318-6454; 0000-0002-6938-1775
             ©️ 2020 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
   In October 2020, only “Rostelecom” spent 7.8 billion rubles on data storage equipment. Other
operators also purchase various storage systems (table 1) [3].

Table 1
Costs of telecom operators for data storage systems
            №                   Month, Year                 Company                     Cost, $
             1                   May 2018                  MegaPhone                 12,64 million
             2                  March 2019                    MTS                   191,66 million
             3                 October 2020                Rostelecom               106,78 million
             4                    planned                     Tele2                  58,87 million

    As can be seen from Table 1, telecom operators of the Russian Federation suffer serious financial
costs for the purchase of equipment for storage systems. On the other hand, modern information
technologies make it possible to manage the resources of data storage systems, use them efficiently and,
therefore, avoid unnecessary costs.
    The data storage system can manage the recording of the incoming data stream and, firstly, distribute
it among different types of media, and secondly, monitor the state of the storage and make a forecast of
capacity growth for its timely extension.

2. Physical Data Storage Managing During Recording and Storing Traffic
  A research a study has been carried out in which a data storage system is considered as a storage
management system that performs the following functions:
      • Distribution of data files on various types of media, depending on the file size and storage
          time
      • Monitoring the storage state based on snapshots of each media state
      • Forecast of storage capacity extension. [4,5].
  The storage management system diagram is shown in Figure 2.




Figure 2: Data Storage Management System

    Obviously, for the implementation of such a storage management system, a soft-ware-hardware
system is needed that performs the above functions.
    It is proposed to include a programmable logic controller (PLC) in this system, which distributes
files to media and software that monitors the state of the physical storage and builds a forecast for its
extension (Figure 3).
   The controller receives an incoming data stream (for example, internet traffic). The controller
performs clustering of incoming traffic using Kohonen's neural networks and, in accordance with the
resulting topological map, distributes data files to media in the physical data storage.
   Physical storage can be organized depending on the information being recorded. In paper [6] there
was considered a 3x3 matrix storage and assumed the distribution of files first by one of the storage
levels, depending on the storage time, and then - the distribution among the level volumes depending
on the file size.
   This solution can be easily adapted to the needs of the owners of technological communication
networks [7]. Since the storage time of data files, as well as metadata, in accordance with the existing
legislation and the upcoming amendments is limited to three years, data files can be distributed across
various types of media de-pending on the type of data (text, sound, video) and size.




Figure 3: Structure of Hardware and Software System

   The structure of physical data storage is determined by the storage system administrator and can
contain, for example, RAID arrays for text files, streamers for audio and video files. In addition,
volumes inside a RAID array can have different operating systems with different sizes of the logical
data block, which will avoid the "under-filling" of files during writing (Figures 4, 5).




Figure 4: Structure of Physical Storage
Figure 5: Comparing files with the same amount of data and logical blocks of different sizes

   This partitioning helps conserve disk space on RAID arrays.
   The state of the storage can be monitored based on the state snapshots coming from the physical
data storage. These state snapshots should show the fullness of each media (media volume) of data in
physical storage.
   It is planned to build a forecast based on the monitoring data for the capacity ex-tension of physical
storage. Wherein each storage tier containing a specific type of media is considered.
   The capacity growth forecast is based on the model presented in fig.6 [8].
   Use only styles embedded in the document. For paragraph, use Normal. Paragraph text. Paragraph
text. Paragraph text. Paragraph text. Paragraph text. Paragraph text. Paragraph text. Paragraph text.




                                Vcur                   Vlim                      Vmax

                                                      tlim                         tmax

                                                              Time for scaling


Figure 6: Characteristics of Data Media

                                                                𝑡𝑙𝑖𝑚                                 (1)
                                            𝑡𝑙𝑖𝑚
                               𝑉𝑙𝑖𝑚 = ∫            𝑓(𝑡)𝑑𝑡 = 𝑇 ∑ 𝑓(𝑡),
                                        1                        1
                                                                𝑡𝑚𝑎𝑥                                 (2)
                                          𝑡𝑚𝑎𝑥
                              𝑉𝑚𝑎𝑥 = ∫             𝑓(𝑡)𝑑𝑡 = 𝑇 ∑ 𝑓(𝑡),
                                        1                         1



where tlimmn – time to reach limited media capacity;
tmaxmn – time to reach maximum media capacity;
f(t) – incoming data function;
T – partition step equal to the unit of the minimum selected time scale.
    The forecasting task is to find the timeline point at which the limited capacity and the maximum
capacity of each media are reached [5].
    To solve this problem, it is necessary to predict the amount of incoming traffic in the storage system.
    The forecast can be made by various methods, while it is necessary to consider the peculiarities of
the data stream entering the recording. Due to uneven user activity associated with weekends and
working days, vacation periods, etc. the incoming data stream is heterogeneous and has a seasonal
structure (Fig. 7)
    In [6,9], a comparison was made between different forecasting methods: statistical forecasting using
an autoregressive model and an integrated moving average (ARIMA) and machine learning methods.
The results showed that the ARIMA model is the most suitable for short-term forecasts (Fig. 8), and for
mid-term forecasts, machine learning methods (Fig. 9).
                                 18

                                 13
                         V, GB
                                 8

                                 3

                                 -2 0       10        20      30   40      50    60     70       80        90 100 110 120 130 140
                                                                                                                                       t, hour

Figure 7: Incoming stream LTE traffic data by MTS



                                                 1

                                                0,5
                                        V, GB




                                                 0

                                           -0,5

                                                 -1
                                                      -1,00        -0,64        -0,27          0,09            0,45      0,82t, hour

Figure 8: Demonstration of the difference between real traffic and predicted obtained using the
ARIMA model (mid-term forecast): MSE=0,04



          1                                                                                          1
  a)                                                                                          b)
      0,5                                                                                        0,5
  V, GB




                                                                                             V, GB




          0                                                                                          0


    -0,5                                                                                        -0,5


          -1                                                                                         -1
               -1,00 -0,71 -0,43 -0,14 0,14 0,43 0,71 1,00 t, days                                        -1,00 -0,71 -0,43 -0,14 0,14 0,43 0,71 1,00 t, days
                                                Real Traffic
                                                Prediction
Figure 9: Demonstration of the difference between real traffic and predicted, obtained using the
machine learning method: a – decision tree, MSE = 0,047; b – random forest, MSE = 0,047 (mid-term
forecast)

   To implement the forecast mechanism, an application was developed. This application helps
automate the process of predicting the capacity extension of each cell of the storage matrix [10].
   Thus, for the further implementation of the hardware-software system, it is necessary to develop a
programmable logic controller that distributes files inside the physical data storage.
   Programmable logic controllers are widely used in automatic control systems. The performance of
modern controllers allows them to use the most efficient control algorithms, such as, for example, neural
networks.
3. Conclusion
    The norms of modern legislation oblige the owners of technological communication networks to
store a large amount of data using their own data storage systems. This leads to serious costs, which, in
the end, fall on the end user of communication services.
    At the same time, modern technologies make it possible to create systems for managing physical
data storage that can efficiently consume physical storage resources. Existing virtualization
technologies make it possible to create structures containing various types of storage media and
distribute the saved traffic files over them depending on certain characteristics of the files.
    Since there is a need for regular scaling of the data storage, it is necessary to monitor its status and
scale only those media whose capacity limits tend to be maximized. Predicting capacity extension
allows for timely scaling.
    Thus, dividing the total incoming data stream by media, predicting capacity consumption, and
monitoring the state of physical data storage allow owners of technological communication networks
to rationally use physical storage resources and avoid unnecessary costs when increasing storage.

4. References
[1] The Ministry of Economic Development supported the draft law on three-year storage of
     technological networks traffic [MER podderzhalo zakonoproyekt o trekhletnem khranenii trafika
     tekhnologicheskikh setey] https://tass.ru/ekonomika/9574021
[2] Communication networks exchange statistics https://digital.gov.ru/ru/pages/statistika-otrasli/
[3] Yarovaya's        law       has      been      strengthened    in     hardware.       Newspaper
     "Kommersant" №185 (09.10.2020), p. 10 https://www.kommersant.ru/doc/4522028 [Zakon
     Yarovoy usililsya apparatno. Gazeta "Kommersant" №185 ot 09.10.2020, str. 10]
     https://www.kommersant.ru/doc/4522028
[4] Tatiana M. Tatarnikova, Ekaterina D. Poymanova. Algorithms for Placing Files in Tiered Storage
     Using Kohonen Map//Selected Papers of the IV All-Russian scientific and practical conference
     with international participation "Information Systems and Technologies in Modeling and Control"
     (ISTMC’2019) Yalta, Crimea, May 21-23, 2019. Pp. 193-202
[5] Tatarnikova T. M., Poymanova E. D. Differentiated Capacity Extension Method for System of
     Data Storage with Multilevel Structure// Scientific and Technical Journal of Information
     Technologies, Mechanics and Optics. 2020. Т. 1. No 1. P. 66–73. doi:10.17586/2226-1494-2020-
     20-1-66-73
[6] Sovetov B. Ya., Tatarnikova T. M., Poymanova E. D. Organization of multi-level data storage.
     Informatsionno-Upravliaiushchie Sistemy [Information and Control Systems], 2019, no. 2, pp. 68–
     75 (In Russian). doi:10.31799/1684-8853-2019-2-68-75
[7] Bogatyrev V.A., Bogatyrev S.V., Derkach A.N. Timeliness of the Reserved Maintenance by
     Duplicated Computers of Heterogeneous Delay-Critical Stream//CEUR Workshop Proceedings,
     2019, Vol. 2522, pp. 26-36
[8] Sovetov B. Ya., Tatarnikova T. M., Poymanova E. D. Storage scaling management model.
     Informatsionno-Upravliaiushchie Sistemy [Information and Control Systems], 2020, no. 5, pp. 43–
     49. doi:10.31799/1684-8853-2020-5-43-49
[9] Poymanova, E.D., Tatarnikova, T.M. Applying machine learning methods for forecasting In 2020
     Wave Electronics and its Application in Information and Telecommunication Systems, WECONF
     2020
[10] The Forecast Application for Capacity Extension of Data Storage Systems. Poymanova E.D.,
     Tatarnikova T.M., Yagotintseva N.V. Computer Registration Certificate RU 2019661945,
     12.09.2019. Application for registration № 2019619010 22.07.2019.