<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Recording and Storage Traffic Management in Storage Systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Tatyana</forename><surname>Tatarnikova</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Russian State Hydrometeorological University</orgName>
								<address>
									<addrLine>ul. Voronezhskaya, 79</addrLine>
									<postCode>192007</postCode>
									<settlement>St. Petersburg</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ekaterina</forename><surname>Poymanova</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Russian State Hydrometeorological University</orgName>
								<address>
									<addrLine>ul. Voronezhskaya, 79</addrLine>
									<postCode>192007</postCode>
									<settlement>St. Petersburg</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ekaterina</forename><surname>Kraeva</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Russian State Hydrometeorological University</orgName>
								<address>
									<addrLine>ul. Voronezhskaya, 79</addrLine>
									<postCode>192007</postCode>
									<settlement>St. Petersburg</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Recording and Storage Traffic Management in Storage Systems</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B022DDDC157CCBFC2DE7A42B85DD420A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>traffic</term>
					<term>data storage system</term>
					<term>data distribution</term>
					<term>physical data storage</term>
					<term>machine learning</term>
					<term>neural network</term>
					<term>forecasting</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The article discusses a complex solution for managing traffic recording and storage in data storage systems. In the conditions of modern legislation, the issue of storing a large amount of data becomes acute. Physical storage management avoids the unnecessary costs of scaling storage systems. The article proposes the structure of a hardware and software complex for managing physical data storage for storage systems that can be used by owners of technological communication networks to store traffic. Control mechanisms are considered, such as the distribution of data over various media using Kohonen neural networks and forecasting capacity extension using a statistical model and machine learning methods.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The requirements of modern legislation in the field of citizen security pose serious challenges to various organizations, including data storage. The anti-terrorist amendments adopted in 2016 (the socalled "Yarovaya law") obliged telecom operators to store traffic metadata for three years, and the traffic itself for six months. In addition, in June 2020, the Ministry of Digital Development, Communications and Mass Media of the Russian Federation proposed a bill, according to which the owners of technological communication networks are required to store traffic for three years <ref type="bibr" target="#b0">[1]</ref>.</p><p>There is also a legislative norm obligating to increase the capacity of traffic storages by 15% annually. Even though the government has postponed the introduction of this norm for 1 year, the problem of using and extending the physical resources of the storage is very acute.</p><p>The volume of Internet traffic over the past 4 years ranged from 32470.782391 PB to 61,226.217838 PB (Fig. <ref type="figure" target="#fig_0">1</ref>) <ref type="bibr" target="#b1">[2]</ref>. That is, in just four years, the volume of traffic has almost doubled. Consequently, the above norm on the annual increase in storage capacity is insufficient, while its implementation requires significant financial costs. In October 2020, only "Rostelecom" spent 7.8 billion rubles on data storage equipment. Other operators also purchase various storage systems (table <ref type="table" target="#tab_0">1</ref>) <ref type="bibr" target="#b2">[3]</ref>. As can be seen from Table <ref type="table" target="#tab_0">1</ref>, telecom operators of the Russian Federation suffer serious financial costs for the purchase of equipment for storage systems. On the other hand, modern information technologies make it possible to manage the resources of data storage systems, use them efficiently and, therefore, avoid unnecessary costs.</p><p>The data storage system can manage the recording of the incoming data stream and, firstly, distribute it among different types of media, and secondly, monitor the state of the storage and make a forecast of capacity growth for its timely extension.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Physical Data Storage Managing During Recording and Storing Traffic</head><p>A research a study has been carried out in which a data storage system is considered as a storage management system that performs the following functions:</p><p>• Distribution of data files on various types of media, depending on the file size and storage time • Monitoring the storage state based on snapshots of each media state • Forecast of storage capacity extension. <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>. The storage management system diagram is shown in Figure <ref type="figure" target="#fig_1">2</ref>. Obviously, for the implementation of such a storage management system, a soft-ware-hardware system is needed that performs the above functions.</p><p>It is proposed to include a programmable logic controller (PLC) in this system, which distributes files to media and software that monitors the state of the physical storage and builds a forecast for its extension (Figure <ref type="figure" target="#fig_2">3</ref>).</p><p>The controller receives an incoming data stream (for example, internet traffic). The controller performs clustering of incoming traffic using Kohonen's neural networks and, in accordance with the resulting topological map, distributes data files to media in the physical data storage.</p><p>Physical storage can be organized depending on the information being recorded. In paper <ref type="bibr" target="#b5">[6]</ref> there was considered a 3x3 matrix storage and assumed the distribution of files first by one of the storage levels, depending on the storage time, and then -the distribution among the level volumes depending on the file size.</p><p>This solution can be easily adapted to the needs of the owners of technological communication networks <ref type="bibr" target="#b6">[7]</ref>. Since the storage time of data files, as well as metadata, in accordance with the existing legislation and the upcoming amendments is limited to three years, data files can be distributed across various types of media de-pending on the type of data (text, sound, video) and size. The structure of physical data storage is determined by the storage system administrator and can contain, for example, RAID arrays for text files, streamers for audio and video files. In addition, volumes inside a RAID array can have different operating systems with different sizes of the logical data block, which will avoid the "under-filling" of files during writing (Figures <ref type="figure" target="#fig_4">4, 5</ref>).   </p><formula xml:id="formula_0">𝑉 𝑚𝑎𝑥 = ∫ 𝑓(𝑡)𝑑𝑡 = 𝑡 𝑚𝑎𝑥 1 𝑇 ∑ 𝑓(𝑡) 𝑡 𝑚𝑎𝑥 1 ,<label>(2)</label></formula><p>where tlimmntime to reach limited media capacity;</p><p>tmaxmntime to reach maximum media capacity; f(t)incoming data function; Tpartition step equal to the unit of the minimum selected time scale. The forecasting task is to find the timeline point at which the limited capacity and the maximum capacity of each media are reached <ref type="bibr" target="#b4">[5]</ref>.</p><p>To solve this problem, it is necessary to predict the amount of incoming traffic in the storage system. The forecast can be made by various methods, while it is necessary to consider the peculiarities of the data stream entering the recording. Due to uneven user activity associated with weekends and working days, vacation periods, etc. the incoming data stream is heterogeneous and has a seasonal structure (Fig. <ref type="figure">7</ref>)</p><p>In <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b8">9]</ref>, a comparison was made between different forecasting methods: statistical forecasting using an autoregressive model and an integrated moving average (ARIMA) and machine learning methods. The results showed that the ARIMA model is the most suitable for short-term forecasts (Fig. <ref type="figure">8</ref>), and for mid-term forecasts, machine learning methods (Fig. <ref type="figure" target="#fig_7">9</ref>).  To implement the forecast mechanism, an application was developed. This application helps automate the process of predicting the capacity extension of each cell of the storage matrix <ref type="bibr" target="#b9">[10]</ref>.</p><p>Thus, for the further implementation of the hardware-software system, it is necessary to develop a programmable logic controller that distributes files inside the physical data storage.</p><p>Programmable logic controllers are widely used in automatic control systems. The performance of modern controllers allows them to use the most efficient control algorithms, such as, for example, neural networks. Real Traffic Prediction</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Conclusion</head><p>The norms of modern legislation oblige the owners of technological communication networks to store a large amount of data using their own data storage systems. This leads to serious costs, which, in the end, fall on the end user of communication services.</p><p>At the same time, modern technologies make it possible to create systems for managing physical data storage that can efficiently consume physical storage resources. Existing virtualization technologies make it possible to create structures containing various types of storage media and distribute the saved traffic files over them depending on certain characteristics of the files.</p><p>Since there is a need for regular scaling of the data storage, it is necessary to monitor its status and scale only those media whose capacity limits tend to be maximized. Predicting capacity extension allows for timely scaling.</p><p>Thus, dividing the total incoming data stream by media, predicting capacity consumption, and monitoring the state of physical data storage allow owners of technological communication networks to rationally use physical storage resources and avoid unnecessary costs when increasing storage.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Increasing in the amount of traffic in 2016-2019</figDesc><graphic coords="1,0.00,191.15,594.96,459.74" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Data Storage Management System</figDesc><graphic coords="2,124.25,485.21,345.90,184.65" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Structure of Hardware and Software System</figDesc><graphic coords="3,143.75,248.44,306.75,253.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Structure of Physical Storage</figDesc><graphic coords="3,164.00,621.91,266.95,115.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Comparing files with the same amount of data and logical blocks of different sizes</figDesc><graphic coords="4,208.15,96.65,178.68,72.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Characteristics of Data Media</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :Figure 8 :</head><label>78</label><figDesc>Figure 7: Incoming stream LTE traffic data by MTS</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Demonstration of the difference between real traffic and predicted, obtained using the machine learning method: a -decision tree, MSE = 0,047; b -random forest, MSE = 0,047 (mid-term forecast)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell cols="2">Costs of telecom operators for data storage systems</cell><cell></cell><cell></cell></row><row><cell>№</cell><cell>Month, Year</cell><cell>Company</cell><cell>Cost, $</cell></row><row><cell>1</cell><cell>May 2018</cell><cell>MegaPhone</cell><cell>12,64 million</cell></row><row><cell>2</cell><cell>March 2019</cell><cell>MTS</cell><cell>191,66 million</cell></row><row><cell>3</cell><cell>October 2020</cell><cell>Rostelecom</cell><cell>106,78 million</cell></row><row><cell>4</cell><cell>planned</cell><cell>Tele2</cell><cell>58,87 million</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://tass.ru/ekonomika/9574021" />
		<title level="m">The Ministry of Economic Development supported the draft law on three-year storage of technological networks traffic [MER podderzhalo zakonoproyekt o trekhletnem khranenii trafika tekhnologicheskikh</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="https://digital.gov.ru/ru/pages/statistika-otrasli/" />
		<title level="m">Communication networks exchange statistics</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Yarovaya&apos;s law has been strengthened in hardware</title>
		<ptr target="https://www.kommersant.ru/doc/4522028" />
	</analytic>
	<monogr>
		<title level="j">Kommersant</title>
		<imprint>
			<biblScope unit="volume">185</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page">10</biblScope>
			<date type="published" when="2020-09-10">09.10.2020. 09.10.2020</date>
		</imprint>
	</monogr>
	<note>Newspaper</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Algorithms for Placing Files in Tiered Storage Using Kohonen Map//Selected Papers of the IV All-Russian scientific and practical conference with international participation &quot;Information Systems and Technologies in Modeling and Control</title>
		<author>
			<persName><forename type="first">Tatiana</forename><forename type="middle">M</forename><surname>Tatarnikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ekaterina</forename><forename type="middle">D</forename><surname>Poymanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISTMC&apos;2019</title>
				<meeting><address><addrLine>Yalta, Crimea</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">May 21-23, 2019</date>
			<biblScope unit="page" from="193" to="202" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Differentiated Capacity Extension Method for System of Data Storage with Multilevel Structure// Scientific and Technical Journal of Information Technologies</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Tatarnikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">D</forename><surname>Poymanova</surname></persName>
		</author>
		<idno type="DOI">10.17586/2226-1494-2020-20-1-66-73</idno>
	</analytic>
	<monogr>
		<title level="j">Mechanics and Optics</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="66" to="73" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Organization of multi-level data storage</title>
		<author>
			<persName><forename type="first">B</forename><surname>Sovetov</surname></persName>
		</author>
		<author>
			<persName><surname>Ya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Tatarnikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">D</forename><surname>Poymanova</surname></persName>
		</author>
		<idno type="DOI">10.31799/1684-8853-2019-2-68-75</idno>
	</analytic>
	<monogr>
		<title level="m">Informatsionno-Upravliaiushchie Sistemy</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="68" to="75" />
		</imprint>
	</monogr>
	<note>Russian)</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Timeliness of the Reserved Maintenance by Duplicated Computers of Heterogeneous Delay-Critical Stream</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">A</forename><surname>Bogatyrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Bogatyrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Derkach</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2522</biblScope>
			<biblScope unit="page" from="26" to="36" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Informatsionno-Upravliaiushchie Sistemy</title>
		<author>
			<persName><forename type="first">B</forename><surname>Sovetov</surname></persName>
		</author>
		<author>
			<persName><surname>Ya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Tatarnikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">D</forename><surname>Poymanova</surname></persName>
		</author>
		<idno type="DOI">10.31799/1684-8853-2020-5-43-49</idno>
	</analytic>
	<monogr>
		<title level="j">Information and Control Systems</title>
		<imprint>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="43" to="49" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>Storage scaling management model</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Applying machine learning methods for forecasting In 2020 Wave Electronics and its Application in Information and Telecommunication Systems</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">D</forename><surname>Poymanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Tatarnikova</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<pubPlace>WECONF</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The Forecast Application for Capacity Extension of Data Storage Systems</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">D</forename><surname>Poymanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Tatarnikova</surname></persName>
		</author>
		<idno>2019619010 22.07</idno>
	</analytic>
	<monogr>
		<title level="j">Yagotintseva N.V. Computer Registration Certificate RU</title>
		<imprint>
			<biblScope unit="volume">2019661945</biblScope>
			<biblScope unit="page">9</biblScope>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
	<note>Application for registration №</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
