<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Description and formation of the database perimeter for systematisation and storage of multi-structured data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>A A Nechitaylo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>O I Vasilchuk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A A Gnutova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Moskovskoe Shosse, 34А, Samara, Russia, 443086</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Volga Region State University of Service</institution>
          ,
          <addr-line>Gagarin st., 4, Togliatti, Russia, 445677</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>80</fpage>
      <lpage>86</lpage>
      <abstract>
        <p>For storage of big data, as a rule, relational databases are used. For multilateral research and analysis of the processes occurring in large economic systems, financiers, economists and other technical specialists use graphs with actual names of enterprises, cities, regions, etc. to move from the physical names of the studied regions to the corresponding parameters of relational databases.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction
“Big data” envisages the process of managing and analysing large amounts of data, which began to
develop rapidly in the world since 2011, while data analysis tools began to receive information from
more diversely structured sources, which is caused by the widespread introduction of digital
technologies in various fields (business, medicine, entertainment, etc.). Thus, in particular, according
to the Forecast of the socio-economic development of the Russian Federation for the period up to
2036, “the health care system will operate within a single digital circuit based on a unified state health
information system (EGISZ), which will enable us to collect, store, process (“Big data”) and analyse
large amounts of information [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. One of the final goals of this work includes the processing and
intellectual analysis of big data parallel computations to create decision-making systems in real time.
To solve such problems, it is necessary to determine not only the relationships (algorithms, models,
etc.) of the final goal with the means to achieve it and the existing constraints, but also the forms for
describing and forming the database perimeter.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Formulation of the problem</title>
      <p>
        The task of synthesising rational schemes for choosing alternatives and evaluating their quality is to
choose the best (optimal) one from the set of competing strategies for solving a certain problem, based
on the analysis of the conditions and consequences of its implementation. A significant addition to
what has been said is that by conditions we mean not some fixed picture of today, but also conditions
that can arise during the implementation of the strategy. Accepting well-grounded optimal solutions is
impossible without the steady and efficient acquisition of reliable large data arrays [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Taking into account the above and taking into account recent trends, in the near future the main
sources of information will be Internet of Things (IoT), social media, meteorological data, GPS
signals from vehicles, location data of mobile network subscribers, Google Trends, search sites work
and other alternative sources of information.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental research</title>
      <p>
        The authors conducted a study on the Internet about the availability of programmes working with
“big data” in the Russian-speaking community. The study showed that large users (such as
Sberbank, Pyaterochka, etc.) are developing such services for their own purposes [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. As for small
business, we have not even identified the formulation of tasks, which determines the relevance of the
goal of this work.
      </p>
      <p>
        In Russia, the Central Bank of the Russian Federation and the Federal Tax Service of the Russian
Federation give particular attention to systematisation and storage of multi-structured data. In this
regard, the business has to solve a number of systemic and technological issues that prevent the
implementation of big data analysis in everyday practice. Among these issues is the lack of strategies
for companies to use the methods and data of big data analysis, the lack of modern technological
solutions, and the lack of relevant skills and understanding of the key streams of data generation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The study of the problems associated with the implementation of “big data” in the activities of
economic entities aimed at ensuring economic security and business development shows that the
strengthening of control by the Central Bank of the Russian Federation and the Federal Tax Service
of the Russian Federation is directed, first of all, to the formation of a database perimeter for
systematisation and storage of multi-structured data of legal entities in a single information space.</p>
      <p>Central banks around the world have created or created departments for working with big data
(“big data”) in order to better understand the economy that they manage in the hope of one day
getting technologies that allow them to monitor the state of the economy in real time. The current
global trend is presented in table 1.</p>
      <p>The Bank of Japan has been using big data since 2013 to analyse economic
statistics, which helps the regulator build more accurate forecasts. It is
planned to use big data for direct collection of economic data, instead of
relying on survey results.</p>
      <p>The People’s Bank of China will more actively use “big data”, artificial
intelligence and cloud computing to increase its ability to recognise, prevent
and reduce inter-industry and intermarket financial risks.</p>
      <p>In China, big data is interested in the context of tracking consumers and,
mainly, to control debtors. One of the main problems of China is the rapid
formation of "bubbles" and the tendency of the population to participate in
financial pyramids. In May, the local Central Bank announced that it plans
to use big data together with artificial intelligence to track such risks.</p>
      <p>USA</p>
      <sec id="sec-3-1">
        <title>Eurozone</title>
      </sec>
      <sec id="sec-3-2">
        <title>United Kingdom</title>
      </sec>
      <sec id="sec-3-3">
        <title>India</title>
        <p>Singapore
Indonesia</p>
        <p>In monetary policy decision making, the regulator continues to rely on
traditional data sets.</p>
        <p>Economists at the Federal Reserve System (FRS) often use “big data” when
studying specific issues, such as spending dynamics after hurricanes.</p>
        <p>Nevertheless, the Fed sees many shortcomings in big data, especially limited
periods of time that cover these supersaturated data sets. This significantly
reduces their value for forecasting.</p>
        <p>In addition, data sets are often produced by private companies that focus on
something other than economic analysis. This can make big data less
reliable, and the Fed is wary of using it for policy development.</p>
        <p>However, in individual projects, big data is already used, for example, to
analyse consumer and government spending after hurricanes. The problem
of big data, according to economists, is the too shallow depth of the sample,
which significantly reduces the possibility of analysis. In addition, data is
often collected by private companies that pursue their own interests.
(Commercial banks: More than 60% of banks in North America believe that
big data gives a competitive advantage, more than 90% that the one who
copes with big data will win in the future, only 37% of banks have working
projects)
The ECB has been exploring big data since 2013. Information on
approximately 40 thousand daily transactions in the money market will
become the basis of the alternative rate, since traditional benchmarks are
becoming unreliable. The regulator has also acquired a large set of pricing
data for actual consumer purchases and is exploring ways to measure
inflation in real time.</p>
        <p>ECB analysts track Google Trends to assess unemployment change, and use
algorithms to analyse media reports to assess whether the rhetoric of the
regulator is viewed as “hawkish” or “pigeon”.</p>
        <p>However, the ECB remains cautious. Just as there are concerns about fake
news that dominates social media, there is a risk that fake news or at least
low quality statistics will crowd out better data in public discourse.</p>
        <p>Information about 40 thousand daily transactions will form the basis of an
alternative discount rate. The ECB has also acquired data on the prices of
real citizens' purchases and is looking at ways of online scraping to measure
real-time inflation.</p>
        <p>The Big Data Board, now called the Data Management Team, has been
created, as well as the data laboratory and analytical unit.</p>
        <p>Bank of England analysts recently used big data to gauge the effects of
exchange rate changes. They also created a platform for these trading
repositories.</p>
        <p>India faces security and privacy concerns, so the country's central bank is
more concerned about cybersecurity in the context of big data. Singapore
has created a Data Analysis Group, whose task is to collect large data, which
will be analysed manually, without the use of AI technology. The main task,
as in India, is the fight against money laundering and terrorism. The
Statistics Department of the Bank of Indonesia explores social networks,
news sites and other sources to analyse consumer sentiment. They recently
began receiving data from online stores.</p>
        <p>
          Figure 1 below illustrates the use of “big data” by banks to predict US home sales through Google
Trends. The technique is based on the fact that people are looking for houses much more immediately
before shopping [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <sec id="sec-3-3-1">
          <title>Number of homes sold monthly</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>Monthly housing price index</title>
          <p>As can be seen, managerial decisions are formed based on the information received and the method
of its transfer along the functional units of the organisation. Quality, reliability, timeliness will
definitely influence the effectiveness of the management decision.</p>
          <p>
            The age of information technology allows us to form, consolidate, modernise information, and
therefore there are problems that lead to an excess of information and a deterioration of its quality [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ].
          </p>
          <p>
            According to experts, the amount of useful information in relation to all the information received
will be reduced from year to year. It is believed that by far not all of the data is valuable - according to
IDC estimates, by 2020 the share of useful information will be only 35% of the total generated [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Solution description</title>
      <p>
        In order for the use of information that a manager receives to be effective, it is necessary to determine
correctly whether the information obtained is useful and whether it will be important for making
management decisions, and only after that choose the right toolkit (algorithms, models, systems,
competence, etc.). Experimental comparison of relational and non-relational databases, conducted by
the authors, confirms expert assessments of specialists that managing thousands of attributes that are
required for economic research in relational databases is inefficient [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>In this connection, the problem of describing and forming the perimeter of the database for
systematizing and storing multi-structured documentary data becomes very relevant for the economy.
A schematic representation of this is shown in Figure 2.</p>
      <p>Document databases are intuitive to developers, since data at the application level is usually
presented as a JSON document. Developers can save data using the same document model that they
use in the application code. In a document database, all documents may have the same or different
data structure. Each document is self-describing (that is, it contains a schema that can be unique) and
does not necessarily depend on any other document. Documents are grouped into “collections,” which
are similar in function to tables in relational databases.</p>
      <p>For example, a JSON file for describing a book element in a simple book database may look like
the following code.</p>
      <p>[
{
"year" : 2013,
"title" : "Turn It Down, Or Else!",
"info" : {
"directors" : [ "Alice Smith", "Bob Jones"],
"release_date" : "2013-01-18T00:00:00Z",
"rating" : 6.2,
"genres" : ["Comedy", "Drama"],
"image_url":
"http://ia.mediaimdb.com/images/N/O9ERWAU7FS797AJ7LU8HN09AMUP908RLlo5JF90EWR7LJKQ7@@._V1_
SX400_.jpg",
"plot" : "A rock band plays their music at high volumes, annoying the neighbors.",
"actors" : ["David Matthewman", "Jonathan G. Neff"]
},
{</p>
      <p>}
]
"year": 2015,
"title": "The Big New Movie",
"info": {
"plot": "Nothing happens at all.",
"rating": 0</p>
      <p>П when using a document database, each entity monitored by the application can be stored as a
separate document. The document database allows the developer to conveniently update the
application as requirements change. In addition, if you need to change the data model, then only the
documents affected by this change need to be updated. To make changes there is no need to update the
schema and interrupt the database. When using a document database, the attributes of each transaction
can be described in one document, which simplifies management and improves reading speed.
Changing the attributes of one transaction will not affect other transactions.</p>
      <p>Analysis of popular document databases: Amazon DocumentDB (compatible with MongoDB),
Amazon DynamoDB, MongoDB, and Couchbase, based on literature and expert opinions, showed the
promise of using MongoDB Documentation AWs to solve economic problems using the AWS
MongoDB Quick solution, Start (also available in PDF format) for deploying a MongoDB cluster in
the AWS cloud.</p>
      <p>In solving the problems of organising modern production, it is necessary to take into account an
increasing number of factors of different natures, which are the subject of research in various fields of
knowledge. Under these conditions, one person cannot decide on the choice of factors influencing the
achievement of a chain, and cannot determine the essential interrelationships between goals and
means; in the formation and analysis of the decision-making model, there should be involved
development teams consisting of specialists from various fields of knowledge, between whom
interaction and mutual understanding should be organised; and the problem of decision making
becomes a problem of collective choice of goals, criteria, means and options for achieving the goal,
i.e. the problem of collective decision-making based on modern methods of processing big data. This
leads to the fact that the formulation of the problem becomes a problem itself, for the solution of
which it is necessary to develop special approaches, techniques, methods. In such cases, it is necessary
to determine the scope of the decision-making problem (problem situation); identify the factors
influencing its decision; choose techniques and methods that allow you to formulate or set the task so
that the decision was made.</p>
      <p>If it is possible to obtain an expression (algorithm, methodology, etc.) connecting the goal with the
means, then the problem is almost always solved. These expressions can represent not only simple
relations, similar to those considered, but also more complex, composite criteria (indicators) of
additive or multiplicative form. Of course, in this case, computational difficulties may arise, which, if
overcome, may require recourse to the formulation of the problem. However, the obtained formalised
representation of the task allows us to apply further formalised methods for analysing the problem
situation.</p>
      <p>Decision making is a scientific direction that began to take shape in the middle of the last century.
The task of this direction is the synthesis of rational schemes for choosing alternatives and evaluating
their quality, which consists of choosing the best (optimal) one from the set of competing strategies for
solving a certain problem. A significant addition to the last phrase is that the terms are understood not
as some frozen picture of “today”, but also those conditions that may arise during the implementation
of the strategy.</p>
      <p>This scientific direction is distinguished by the fact that the choice of the optimality criterion must
be approached creatively. According to this approach, the optimality criterion is not a kind of
extremum of a function of one variable, but is an area of multidimensional feature space in which
some particular parameters may be non-optimal. It is implied that we are talking about the fact that not
all particular utility functions are considered as equilibrium, but as a hierarchically ordered system of
utility functions with different weights (the choice of which, along with the choice of the functions
themselves, is actually the content of the decision-making process).</p>
      <p>Thus, in order to make a decision, it is necessary to obtain an expression associating the goal with
the means of achieving it using the input criteria for assessing the attainability of the goal and
evaluating the means. If such an expression is obtained, then the problem is solved.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In the classical theory of decision making, the central question is associated with the axioms of
"rational" choice. As a result, when referring to the methods of the classical theory of decision making,
the choice is reduced to binary preference relations. However, the classical rational bases of choice are
not universal, but represent only a limited part of the grounds on which reasonable and natural
decision-making mechanisms can be built. In order to simplify the construction and interaction of
these mechanisms (algorithms, techniques, etc.) for different sectors of the national economy, it is
advisable to build typical perimeters (possibly interfaces) of big data collection and storage bases.</p>
      <p>The number and complexity of such problems, for which it is impossible to get the performance
criterion in analytical form immediately, but as the degree of development of civilisation increases the
price of the wrong decision also increases. For problems of decision making, as a rule, a combination
of qualitative and quantitative methods is characteristic. Decision-making in industrial control systems
is often associated with a lack of time: it is better not to make the best decision, but in the required
time, because otherwise the best solution may no longer be needed. Therefore, the decision often has
to be taken in the context of incomplete information (its uncertainty or deficit), and it is necessary to
ensure that the most relevant decision-making information and the most objective preferences
underlying the decision-making can be determined as quickly as possible.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Central banks use big data to form financial policy URL: http://www</article-title>
          . vestifinance.ru/articles/95398 (
          <issue>20</issue>
          .
          <fpage>12</fpage>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] Forecast of the socio-economic development of the Russian Federation for the period up to 2036 URL: http://economy</article-title>
          .gov.ru/wps/wcm/connect/9e711dab-fec8
          <source>-4623-a3b1-33060a39859d/ prognoz2036.pdf?MOD=AJPERES&amp;CACHEID=9e711dab-fec8-4623-a3b1-33060a39859d (15.11</source>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] How big is the internet</article-title>
          ? URL: https://geektimes.ru/company/asus/blog/275032/ (25.
          <fpage>10</fpage>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>How</given-names>
            <surname>Central</surname>
          </string-name>
          <article-title>Banks Are Using Big Data to Help Shape Policy URL</article-title>
          : https:// www.bloomberg.com/news/articles/2017-12-18/central-banks
          <article-title>-are-turning-to-big-data-to-helpthem-craft-</article-title>
          <string-name>
            <surname>policy</surname>
          </string-name>
          (
          <volume>15</volume>
          .
          <fpage>11</fpage>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] The Future of Prediction: How Google Searches Foreshadow Housing Prices</article-title>
          and Sales URL: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=
          <volume>2022293</volume>
          (
          <issue>05</issue>
          .
          <fpage>11</fpage>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] The problems of making effective management decisions URL: https://cyberleninka</article-title>
          .ru/article/n/problemy
          <article-title>-prinyatiya-effektivnogo-upravlencheskogo-</article-title>
          <string-name>
            <surname>resheniya</surname>
          </string-name>
          (
          <volume>25</volume>
          .
          <fpage>10</fpage>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Vashko</surname>
            <given-names>Т А</given-names>
          </string-name>
          <year>2011</year>
          <article-title>Information duplication technology as a means of improving the quality of decision making Problems of the modern economy</article-title>
          <volume>4</volume>
          <fpage>137</fpage>
          -
          <lpage>141</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Kazanskiy</surname>
            <given-names>N L</given-names>
          </string-name>
          <year>2017</year>
          <article-title>Efficiency of deep integration between a research university and an academic institute</article-title>
          <source>Procedia Engineering</source>
          <volume>201</volume>
          <fpage>817</fpage>
          -
          <lpage>831</lpage>
          DOI: 10.1016/j.proeng.
          <year>2017</year>
          .
          <volume>09</volume>
          .604
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kazanskiy</surname>
            <given-names>N L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Protsenko</surname>
            <given-names>V I</given-names>
          </string-name>
          and
          <string-name>
            <surname>Serafimovich P G 2017</surname>
          </string-name>
          <article-title>Performance analysis of real-time face detection system based on stream data mining frameworks</article-title>
          <source>Procedia Engineering</source>
          <volume>201</volume>
          <fpage>806</fpage>
          -
          <lpage>816</lpage>
          DOI: 10.1016/j.proeng.
          <year>2017</year>
          .
          <volume>09</volume>
          .602
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Protsenko</surname>
            <given-names>V I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seraphimovich</surname>
            <given-names>P G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popov S B and Kazanskiy N L 2016</surname>
          </string-name>
          <article-title>Software and hardware infrastructure for data stream processing</article-title>
          <source>CEUR Workshop Proceedings</source>
          <volume>1638</volume>
          <fpage>782</fpage>
          -
          <lpage>787</lpage>
          DOI: 10.18287/
          <fpage>1613</fpage>
          -0073-2016-1638-782-787
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Kazanskiy</surname>
            <given-names>N L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Protsenko</surname>
            <given-names>V I</given-names>
          </string-name>
          and
          <string-name>
            <surname>Serafimovich P G 2014</surname>
          </string-name>
          <article-title>Comparison of system performance for streaming data analysis in image processing tasks by sliding</article-title>
          window
          <source>Computer Optics</source>
          <volume>38</volume>
          (
          <issue>4</issue>
          )
          <fpage>804</fpage>
          -
          <lpage>810</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>