<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Modelling to Analyze How the Cities in the Volga Region Correspondent to the Digital State Format</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>I N Khaimovich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V M Ramzaev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V G Chumak</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Moskovskoe Shosse 34А, Samara, Russia, 443086</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara University of Public Administration International Market Institute</institution>
          ,
          <addr-line>G.S. Aksakova Street 21, Samara, Russia, 443030</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>46</fpage>
      <lpage>55</lpage>
      <abstract>
        <p>The article suggests the methodology for assessing the readiness of Volga region municipal entities to introduce digital state. The authors worked out a model of statistic tests based on multiple probability theoretic and statistical modelling of parameter values of technological elements of digital economy. This model will allow the Volga region cities to define their possibility to participate in the State Program of Digital Economy, to choose cities most suitable for introduction of modern technologies, to identify the main shortcomings hindering their integration into the program. This research may be of interest to experts in the field of digital economy and Big Data management.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>At present, Russia is carried out a transition to digital development of the state and the digital
economy. In the context of fierce competition among municipal entities, the issue of introducing
information technologies becomes especially urgent. Municipal entities (ME) - the city districts are
key elements of the territorial organization of the country's economy, where new business, financial
and cultural centers are formed, which stimulate change. The implementation of modern forms and
tools of the digital economy in the ME is currently constrained by the following problems: restricting
access to the digital systems in municipalities, the depressed state of the ME economies, the lack of
social involvement of local government in the management process, and the lack of bases for building
the digital economy.</p>
      <p>At the same time, further development of the ME is impossible without attracting new intellectual
services, one of which is the technology of the digital state. The following questions arises:
- Are MEs ready to introduce digital state technologies, i.e. to the elimination of intermediaries in
services, the implementation of direct transactions in the economy, etc?
- What is the methodology which is used to check the readiness.</p>
      <p>When managing the development of the regional economy in the modern digital format, it is
necessary to solve the following tasks:</p>
      <p>- consider the main ratings of the conformity assessment of the ME to the principles of the digital
state;</p>
      <p>- develop a methodology for applying, through analysis of compliance data, to the principles of the
"digital state";</p>
      <p>- analyze the cities of the Volga region using this method.
2. Analysis of existing ratings in the concept of “digital state”
The concept of "digital state" includes the following concepts:</p>
      <p>- "smart economy", which includes a high level of indicators in the areas of innovation,
employment, trade, productivity, physical infrastructure;</p>
      <p>- "smart environment" that links air quality index, water supply, noise level, environmental quality,
biodiversity, power economy;</p>
      <p>- "smart society and culture", which includes education, health service, security, housing, culture,
social involvement.</p>
      <p>
        All these indicators should interact only through information and communication technologies
(ICT).To analyze the readiness and attract investment in the ME, the "Smart City" rating is developed
by the United Nations [1] and along with that the Russian rating for the assessment of sustainable
urban development was developed [
        <xref ref-type="bibr" rid="ref1">2</xref>
        ]. Analysis results are given in table 1.
      </p>
      <p>Ratings
Rating of sustainable development of cities in RF
Smart city system of indicators</p>
      <p>Let's consider in detail the rating of sustainable development of Russian cities, proposed by SGM
Agency. This rating includes more than thirty indicators that characterize ME: economy, municipal
infrastructure, social sphere, ecology. The respondents of the selection are the administrative centers
of Russia. The advantage of this rating is the balance of the indicators under consideration, since
unbalancing adversely affects the sustainable development of the ME. The disadvantages include the
fact that high performance does not always determine the leading position of the ME in the country.
Every year the leading cities are the same in this ranking: Moscow, St. Petersburg and Ufa. The
administrative centers of the Volga region are annually referred to the outsiders. The shortcomings in
the context of the topic under consideration include the isolated character of the indicators of this
rating to the topic "digital state". For example, the indicator "demography", consisting of the criteria
"natural growth rate", "migration growth rate", "demographic burden" only indirectly affects the use of
ICT in all areas of the city.</p>
      <p>Next, consider the smart cities metrics system developed by the United Nations Economic
Commission for Europe. The degree of readiness to introduce new ICTs is assessed through the city's
innovation indicators to improve the living standards of the population. This system assesses the
effectiveness of activities and services to meet future generations in various aspects of activities.</p>
      <p>This system consists of the following three blocks:</p>
      <p>Block 1. Economics: ICT infrastructure, innovation, employment, trade - e-commerce, trade
export / import, productivity, physical infrastructure - water supply, electricity, health service,
transport, buildings.</p>
      <p>Block 2. Environment: air quality, water supply, noise level, environmental quality, biodiversity,
power economy.</p>
      <p>Block 3. Society and culture: education, health, safety – consequence management, security
emergencies, security - ICT, housing, culture, social involvement. The last indicator includes
calculation of levels of public participation, gender equality of incomes, the ability of people with
special needs, attractiveness for qualified personnel, the Gini coefficient.</p>
      <p>The advantage of this system is the detailed calculation of all indicators. The disadvantage is that
this assessment can be applied only to European countries.</p>
      <p>After a detailed analysis of European and Russian ratings and evaluation systems, a methodology
was developed for assessing cities to introduce technologies of the "digital state", taking into account
all the features of the Volga region [3-6].
3. Involvement model of ME of Volga region in “digital economy” technology
Despite a large number of studies in this field [7-11], there is no unified methodology for assessing the
involvement of ME in “digital state” technology. International ratings and systems are only being
tested and they answer specific questions: is the economy of the city stable; does it have elements of a
"smart" city. They do not assess any indicators of intellectualization of the urban environment.</p>
      <p>Further, a model of statistical tests will be considered based on repeated probability-theoretical and
statistical modeling of the parameter values of the "digital state" concept. This model will be
associated with the analysis of a large amount of data, which will require the automation of evaluation
indicators, and it can be based on the Big Data technology [12,13].</p>
      <p>The method of using database mining technique in order to support business objectives is as
follows:</p>
      <p>1. Formation of a big data set in the hadoop from the twitter using the filter “Samara region”,
revealing the number of calls;</p>
      <p>2. Division of the formed set into various filters associated with the performance measures of the
involvement of the ME in the "digital state" technology;
3. Monitoring of the stream analysis of unstructured data sources using filters;
4. Development of a program in Scala Programming Language for working with filtering in the
field of Big Data;
5. Debugging and program testing with a set of practical data;
6. Analysis of calculation results.</p>
      <p>The social network “Twitter” is used in order to receive data, since it is an “open” product, its
application does not require additional investment, and 50% of Internet users have profiles in this
program. Twitter is the second most popular network among users worldwide, second only to
Facebook. However, unlike Facebook, which does not provide open access to its data, Twitter
provides such access; there are no restrictions on access to the server's data sets. Users of this social
network exchange mainly textual information, which is an undoubted advantage in processing. Twitter
is not an object network and most widely reflects public opinion on many issues of interest, so data
processing from this social network was best possible for the formation of small business zones in the
region.</p>
      <p>To work with BIG DATA in social networks it is necessary to use methods of data collection,
processing and analysis. The data is collected in real time, within a certain geolocation, or within the
entire network, according to certain patterns. Information of interest for analysis is: location, date and
time, content, “author” of the content (user), communication between users. Data collection in social
networks can be performed using the following tools: Apache Hadoop, Biglnsights (IBM), Cloudera,
Hortonworks, and Storm. Hortonworks was chosen to carry out research on ME involvement in the
"digital economy". Twitter Application (apps.twitter.com) was used, in which the following key
parameters were defined and refined: API key, API secret, Access token, Access token secret.</p>
      <p>To collect data using Hortonworks, Twitter App, the Flume service configuration file was used in
the Hortonworks Sandbox virtual machine. After installing the virtual machine Hortonworks_
Sandbox version 2.3 and the Flume service settings, the system is ready to download data from
Twitter. To view and download files, go to the HDFS folder where the data is process. The types of
the HDFS file structure in the Hortonworks virtual machine when solving the task of ME involvement
is shown in figure 1.</p>
      <p>The collected data must be structured (i.e. processed) in accordance with the MapReduce paradigm.
MapReduce is a framework for performing distributed tasks using a large number of computers that form
a cluster.</p>
      <p>Using MapReduce helped to structure the data stream from social networks by the criteria: fonts, text
size, color, link to user profile, location, time and so on.</p>
      <p>To determine the data for ME analysis, for our study it is necessary to collect the data of the following
types: placement, text, language and time. In order to extract only this information, you can use the
MapReduce technology built into the Hortonworks Sandbox tool. For data processing we use DBMS
Hive in Hadoop environment, which allows performing operations on data and their analysis by
SQLsequel like. To do this, it is necessary to create a file for processing and creating the necessary hivedll.sql.
tables.</p>
      <p>Run this file using the following command: Hive_f hiveddl.sql. Structured data will be placed in table
2.</p>
      <p>A
Data/Time
This table was obtained from social network “twitter” data.</p>
      <p>Thanks to the BIG DATA technology, it is possible to store and update data in the file system
"hаdoop" for the filter "Samara region" (filter1 = {Samara region}). Then it is necessary to filter this area
according to the basic parameters for estimating the ME, by setting, for example, the following filters:
Filter2 (economy) = {roads, goods}; Filter3 (environment) = {forest, air}; Filter4 (society and culture) =
{nightclub, concert, session, hangout}.</p>
      <p>It is possible to obtain graphs of the number of users accessing filters (i.e., the value of the ME
internetization indicator) from the data collection time.</p>
      <p>The time of data collection from the Internet in BIG DATA technology is unlimited.</p>
      <p>As a result, we receive a dynamic change in the information in real time from the Internet, which
allows us to monitor the stream analysis of unstructured information (the technology of In-Memory Data
Processing and Stream) by filters with minimal investment. To implement this method, a program was
written in Scala Programming Language:
val file = spark.textFile(“hdfs://… “)
val errors=file.filter(line=&gt;line.contains(“Samara region“))
//count all the data
errors.count()
//count data mentioning Filter
errors.filter(line=&gt;line. contains(“concert“)).count()
//Fetch the filter as an array of string
errors.filter(line=&gt;line. contains(“doctor consultation“)).collect()</p>
      <p>After the work of the program we obtain a dynamic change of parameters in the BIG DATA
environment, which allow us to determine some of the indices of ME involvement in the “digital
economy” taking into account unstructured information. This method of collecting information for
estimating parameters can also be used for other social systems and sites, and also statistics of official
sources posted on the Internet can be used to collect information</p>
      <p>Thus, a tool is proposed for data collection in the ME system of indicators in the “digital economy”.</p>
      <p>The system of indicators of ME involvement in the introduction of “digital state” technologies is
shown in figure 3.</p>
      <sec id="sec-1-1">
        <title>Digital Economy</title>
      </sec>
      <sec id="sec-1-2">
        <title>Digital Government</title>
        <p>The indicators in the above mentioned indicator system include the following:
- Digital Economy: the indicator for innovation, entrepreneurship, the city's competitiveness, the
indicator for producibility, the labor market, the indicator for financial independence;
- Digital Mobility: local transport system, (inter-) national accessibility, ICT infrastructure,
transport system stability;
- Person and ICT: an indicator for intelligence, lifelong learning, an ethnic variety;
- Digital Life: cultural and entertainment facilities, health status, individual security, housing
quality, educational institutions, tourist attraction, social cohesion;</p>
        <p>- Digital Government: political awareness, public and social services, effective and transparent
administration;</p>
        <p>- Digital Environment: air quality (without pollution); environmental awareness, sustainable
resource management.</p>
        <p>These indicators have expert values. To compare different indicators, it is necessary to standardize
the values from the samples of several cities. The study uses the standardization method of
ztransformation using the following formula:
zi =
xi − x</p>
        <p>S
,
where x is the average value in the sample, S is the standard variation in the sample. This method
converts all values of the indicators into standardized values with an average of 0 and a standard
deviation of 1. The method has the advantages of considering heterogeneity within groups and
maintaining its metric information. In addition, high sensitivity to changes is achieved.</p>
        <p>In order to obtain results by indicator level, indicators and final result for each city, it is necessary
to summarize the values at the level of the indicator. To aggregate the corresponding group of
indicators by domain, we also take into account the coverage factor of each indicator. A certain result
from the indicator covering all cities weighs more than the indicator covering only, for example, 6
cities. In addition to this slight correction, the results were aggregated at all levels without any
weighting. Aggregation was added, but divided by the number of values added. This allows us to
include cities that do not cover all indicators. Their results are calculated from the available values.
However, it is necessary to ensure good coverage of all cities in order to obtain reasonable results.</p>
        <p>Some indicators can be not only expert, but also calculated, they include the indicator of
manufacturability of production, the index of innovation, the indicator of Internetization, the index of
intellectualization, the indicator of financial independence, the index of energy efficiency, the
indicator of the introduction of creative technologies. These groupings can allow us to obtain quick
management decisions depending on the average values of the indicators. If the average is in the range
of 3.7 and above, then the ME is ready for the introduction of digital economy technologies. If the
average value is in the range from 2, 5 to 3.7, then the ME has an average level of readiness. If the
Indicator for innovation
Entrepreneurship
Competitiveness
Indicator for producibility
Labour market
International integration
independence)
Total
Person and ICT
Indicator for intelligence
Lifelong learning
Ethnic variety
Openness
Total
Digital Mobility
Local transport system
(Inter-) national accessibility
ICT- infrastructure
average value is in the range from 1.95 to 2.5, then the ME has a satisfactory level of readiness. If the
average is below 1.95, then the ME is not ready to introduce digital economy technologies [14].</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Results and discussion</title>
      <p>According to the methodology developed above, there were carried out some calculations of some key
figures and indicators of readiness of cities in the Volga region to implement the technologies of the
"digital state".</p>
      <p>Table 3 shows the calculations of the relative and absolute values of the ME in the Volga region,
taking into account the z - transformation.</p>
      <p>Further the figures 4-9 show the histograms of digital city indicators for the cities of Samara and
Ulyanovsk.
0,547722558
1,707825128
2,217355783
0,975900073
0,577350269
1,154700538</p>
      <sec id="sec-2-1">
        <title>Digital economy</title>
      </sec>
      <sec id="sec-2-2">
        <title>Person and ICT</title>
        <p>Average</p>
        <p>Openness
Ethnic variety</p>
        <p>Life learning
Indicator for intelligence</p>
        <p>Average
Transport system stability</p>
        <p>ICT-infrastructure
(Inter-) national accessibility</p>
        <p>Local transport system
Digital life</p>
        <p>Ulyanovsk
Samara
-1</p>
        <p>Average</p>
        <p>Social cohesion</p>
        <p>Tourist attractiveness
Educational institutions</p>
        <p>Energy index
individual security</p>
        <p>Health conditions
Cultural and entertainment establishments</p>
        <p>Average
Effective and transparent</p>
        <p>administration
Public and social services</p>
        <p>Political awareness</p>
        <p>Average
Sustainable resource</p>
        <p>management</p>
        <p>Ecological awareness
Air quality (without pollution)
0
0,5
1
1,5
2</p>
        <p>Then applying a similar method it is possible to obtain level of indicators for the cities of
Ulyanovsk and Samara (figure 10). From the histogram data, it can be seen that Samara is ahead of
Ulyanovsk in the indicators of “digital production”, “digital economy”, “digital life”, but the indicators
of "people and ICT", "digital environment", "digital life" are better for the Ulyanovsk municipal
entity. In general, the willingness to introduce ICT in both cities is the same.</p>
        <p>Digital environment
0
0,5
1
1,5
2
Ulyanovsk</p>
        <p>Samara
Ulyanovsk
Samara</p>
        <p>Average</p>
        <p>Digital Life</p>
        <p>Let us carry out a comparative analysis of indicators of the "digital city" for municipalities with a
lower level of readiness, i.e. located below the zero level, i.e. requiring significant investment in the
introduction of ICT. A comparative graph of the study results is shown in figure 11.</p>
        <p>As a result, it is possible to identify the main trends for investing in the ME of Volga region on the
basis of a comparative analysis of the "digital city" indicators. Figure 11 shows that investing in a
region with indicators below the zero level is not profitable. Many cities of the Volga region belong to
this zone, for example, Zhigulevsk. It is better to invest in the ME with a level of readiness above the
zero level, for example, Ulyanovsk. These cities correspond more to the concept of the "digital city",
they are almost ready to introduce the technologies of the "digital state".</p>
        <p>Thus, the assessment model will allow to determine the level of development of municipal entities,
which are ready to implement the digital state, to identify shortcomings in the group "which is not
ready for implementation", will improve the performance of the ME on the basis of a detailed analysis
of data of all major cities of the Volga region.</p>
        <p>Samara</p>
        <p>Ulyanovsk</p>
        <p>Zhigulevsk</p>
        <p>Average of all cities</p>
        <p>Digital City
Digital Economy
1,5</p>
        <p>1
0,5</p>
        <p>0
-0,5</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. References</title>
      <p>[1] Access mode: http://www.unece.org/fileadmin/DAM/hlm/documents/2015/ECE_HBP_2015_4.ru.pdf (9.</p>
      <p>11.2017).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[2] [7] [8] [9] [10] [11] [12] [13]</source>
          [14] Access mode: http://agencysgm.com/projects/Рейтинг%20устойчивого%
          <fpage>20развития</fpage>
          -
          <lpage>2015</lpage>
          .pdf (
          <issue>9</issue>
          .
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>2017) Terekhin</surname>
            <given-names>E A</given-names>
          </string-name>
          2017 Computer Optics
          <volume>41</volume>
          (
          <issue>5</issue>
          )
          <fpage>719</fpage>
          -
          <lpage>725</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-5-
          <fpage>719</fpage>
          -725
          <string-name>
            <surname>Afanasyev</surname>
            <given-names>A A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zamyatin</surname>
            <given-names>A V</given-names>
          </string-name>
          2017
          <source>Computer Optics</source>
          <volume>41</volume>
          (
          <issue>3</issue>
          )
          <fpage>431</fpage>
          -
          <lpage>440</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179- 2017-41-3-
          <fpage>431</fpage>
          -440
          <string-name>
            <surname>Vorobiova N</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sergeyev</surname>
            <given-names>V V</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chernov</surname>
            <given-names>A V</given-names>
          </string-name>
          2016
          <source>Computer Optics</source>
          <volume>40</volume>
          (
          <issue>6</issue>
          )
          <fpage>929</fpage>
          -
          <lpage>938</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2016-40-6-
          <fpage>929</fpage>
          -938
          <string-name>
            <surname>Boori M S</surname>
            , Kuznetsov
            <given-names>A V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choudhary K K and Kupriyanov</surname>
            <given-names>A V</given-names>
          </string-name>
          2015
          <source>Computer Optics</source>
          <volume>39</volume>
          (
          <issue>5</issue>
          )
          <fpage>818</fpage>
          -
          <lpage>822</lpage>
          DOI: 10.18287/
          <fpage>0134</fpage>
          -2452-2015
          <source>-39-5-818-822 Akaslan D and Taskln S 2016 4th Int. Istambul Smart Grid Congress and Fair</source>
          (New York: IEEE Press)
          <string-name>
            <surname>De Domenico</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lima</surname>
            <given-names>A A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gonzalez M C 2015 EPJ Data</surname>
          </string-name>
          <article-title>Science 1 1-11 Glebova I S, Yasnitskaya Y S and Maklakova N V 2014 Mediterranean J</article-title>
          .
          <source>of Social Sciences</source>
          <volume>12</volume>
          <fpage>129</fpage>
          -133
          <string-name>
            <surname>Ishkineeva</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ishkineeva</surname>
            <given-names>F</given-names>
          </string-name>
          <source>and Akhmetova S 2015 Asian Social Science</source>
          <volume>5</volume>
          <fpage>70</fpage>
          -
          <lpage>73</lpage>
          Khatoun R and Zeadally
          <source>S 2016 Communications of the ACM</source>
          <volume>8</volume>
          <fpage>46</fpage>
          -57
          <string-name>
            <given-names>Khaimovich I N</given-names>
            ,
            <surname>Ramzaev V M and Chumak V G 2016 CEUR Workshop</surname>
          </string-name>
          Proceedings 1638
          <fpage>864</fpage>
          -872
          <string-name>
            <given-names>Khaimovich I N</given-names>
            ,
            <surname>Ramzaev V M and Chumak V G 2015 CEUR Workshop</surname>
          </string-name>
          Proceedings 1490
          <fpage>327</fpage>
          -337
          <string-name>
            <surname>Komarevtseva O O 2017 R-Economy</surname>
          </string-name>
          3
          <fpage>32</fpage>
          -
          <lpage>39</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>