<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Big Data Analysis for Demand Segmentation of Small Business Services by Activity in Region</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>V.М. Ramzaev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I.N. Khaimovich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V.G. Chumak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>International Market Institute</institution>
          ,
          <addr-line>Aksakova street, 21, 443030, Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>34 Moskovskoe Shosse, 443086, Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>48</fpage>
      <lpage>53</lpage>
      <abstract>
        <p>The article suggests a tool for the efficiency improvement in using budget funds in the region in the sphere of small business. This is the most important task in the current economic conditions, in which solution there is a possibility of making effective management decisions. The suggested method of regulation based on the analysis of social networks using BIG DATA technology can be effective in managing various innovative processes of economic development in the region, which are characterized by a variety of forms and a wide range of components and factors, as well as dynamic development and active transformation of life. The use of modern software and hardware from BIG DATA technology allows real time evaluation and visualization of changes</p>
      </abstract>
      <kwd-group>
        <kwd>competitiveness</kwd>
        <kwd>territory management</kwd>
        <kwd>intensive data</kwd>
        <kwd>mathematical models</kwd>
        <kwd>BIG DATA technology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Subject of research</title>
      <p>Under modern social and economic conditions the vital task is the state regulation of market economy players, among which one
of the most important in the region is small business (SB). The foreign experience shows that without this sector it is impossible to
develop economy as far as the economic growth rate depends on it as well as the structure and the quality of up to 40-50% of gross
national product.</p>
      <p>The structure of small and medium-sized enterprises by types of economic activity is varying. As it can be seen in Figure 1, the
largest number of enterprises are engaged in trade, repair of motor vehicles, motorcycles, household products and personal items,
which is explained by lower barriers to entry these areas of activity.</p>
      <p>The following features of SB development management can be distinguished. First, it is necessary to note a wide range of
services provided by SB subjects, as well as a huge range of goods sold by them. Secondly, the SB differs significantly in being
more mobile in comparison with the large one. By the mobility we mean a continuous change in the market conditions, the closure
of old and the emergence of new economic entities, which is explained by the high variability in tastes and preferences of
consumers of goods and services of SB entities, i.е. there is a quite active process where some types of activities are substituted by
others, determined by a change in consumer demand, which is especially important under the modern conditions of import
substitution. Thus, according to the statistics, up to 85% of the new entities of the SB is closed during the first year of its existence.
94 out of the 100 registered small businesses stops operations by the fourth year.</p>
      <p>In this regard, the use of traditional methods of public administration, based on the data of monthly, quarterly and annual
statistics, does not bring the expected result and does not allow us to identify trends for the development or closing up of certain
activities, therefore, often making decisions about financial support and funds allocation for some projects is significantly behind
the needs, and in some cases also contradict the changed real market situation by the time the financing begins.</p>
      <p>For example, at the present time in the Samara region, state support for small and medium-sized enterprises is being
implemented within the framework of the State Program “Development of Entrepreneurship, Trade and Tourism in the Samara
Oblast” for 2014 - 2019, approved by the Government of the Samara Oblast Decree No. 699 dated November 29, 2013. Support to
businessmen of the Samara Oblast is maintained in different directions and consists of information and consulting services, training,
financial assistance, assistance in selling goods and services.</p>
      <p>
        At the same time, it should be noted that, despite a number of measures used by the authorities in the region to manage the
development of SB, effective methods for selecting priority directions for the development of SB have not been developed so far,
which make it possible to direct budgetary funds to the development and support of entrepreneurs more appropriate [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1-4</xref>
        ]. The
market of small and medium-sized businesses is a quite dynamically changing environment. It is necessary to take this into account
in the medium and long term planning and regional authorities should take it as a basis for the support and stimulation of the
development of the most high-demand areas of the SB activities and for monitoring the effectiveness of budgetary funds application
for programs in this field of entrepreneurship under the changing market conditions.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The methods of applying the business intelligence at determining small business segments in the region</title>
      <p>
        This task may be solved with the help of modern information technologies [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5,6,7</xref>
        ], to which BIG DATA technology refers,
directly connected with business intelligence [
        <xref ref-type="bibr" rid="ref10 ref8">8,10</xref>
        ]. Along with this application of modern BIG DATA technologies provides an
opportunity to distinguish the zones – territories of the most active consumption and demand for some or other products and
services on the market in real time mode.
      </p>
      <p>
        To manage the development of small and medium-sized businesses in the region the special methodology was worked out based
on the BIG DATA technology [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which consists of the following stages: identification of the role and place of small business in
the region; identification of the main types of goods and services, offered by small businesses in the region; creation of consumer
profile who uses the services of small business; creation of information model of small business consumer in the region; formation
of small business zones in the region; development of guidelines for management decision making.
      </p>
      <p>If the role and place of small business in the region, main types of goods and services, offered by entrepreneurs in the region
were analyzed then to create a consumer profile and information model it is necessary to use BIG DATA technology. The method
of applying the business intelligence is as follows:
1. Formation of a set of BIG DATA in hadoop from twitter using filter Samara Oblast, showing the hit count;
2. Division of formed set into different filters connected with basic factors of small business;
3. Carrying out monitoring of flow content analysis in filters;
4. Taking quick actions in cases of stable “burst” of hit count;
5. Program development in Scala language to work with filtration in the BIG Data area;
6. Program debugging and testing with a set of practical data;
7. Analysis of computational results.</p>
      <p>To receive data we use social network «twitter», as it is “open” product, its application does not require any additional
investment, and 50% of Internet users have profiles in this program. Twitter is the second in popularity network among the users in
the entire world, come second only to Facebook. However unlike Facebook, which does not make accessible its data, Twitter
provides such access, there are no limitations in access to the sets of data in the server. The users of this social network share mainly
text messages, and this fact is absolute advantage while processing. Twitter is not a network with a specific focus and more broadly
reflects public opinion in many points of interest, that is why the processing of data from this social network was the best possible
to form small business zones in the region.</p>
      <p>To work with BIG DATA in social networks we used the methods of collecting, processing and analyzing the data. Data
collecting is carried out in a real time, within the certain geo location, or within the entire network according to the predefined
patterns. Information of interest for analysis in the area of SM is: location, date and time, content, “author” of content (user), links
with users. Data collecting may be fulfilled with the help of following tools: Apache Hadoop, Biglnsights (IBM), Cloudera,
Hortonworks, Storm. To carry out the research in the field of SB we chose Hortonworks. We used Twitter Application
(apps.twitter.com), where the key parameters were defined using API key, API secret, Access token, Access token secret.</p>
      <p>For data collecting with Hortonworks, Twitter App we used flume service configuration file in Hortonworks Virtual Machine
Sandbox. System is ready to load data from twitter after Hortonworks Virtual Machine Sandbox version 2.3 is installed and flume
service is configured. Navigate to HDFS folder in order to view downloaded files for data processing. HDFS view in Hortonworks
virtual machine while solving tasks in the area of SB is shown in Fig. 2.</p>
      <p>Collected data must be structured (i.e. processed) according to MapReduce paradigm. MapReduce is a programming model and
an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.</p>
      <p>MapReduce gave ability to structure the data flow from social networks using following criterion: font, text size, color, user
profile hyperlink, location, date and others.</p>
      <p>In order to define user profile for SB in our research we need data of following types: location, text, language and date. We used
MapReduce to retrieve only required data in Hortonworks Sandbox tool. For data processing in Hadoop environment we chose
Hive DB that gives ability to operate with the data and apply analysis via SQL-like queries. For this we created sql-script hivedll.sql
for necessary tables creation. File contents is shown below:
// twitter table identifiers
CREATE EXTERNAL TABLE tweets_raw (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT&lt;
text:STRING,
usr:STRUCT&lt;screen_name:STRING,name:STRING&gt;&gt;,
entities STRUCT&lt;
urls:ARRAY&lt;STRUCT&lt;expanded_url:STRING&gt;&gt;,
user_mentions:ARRAY&lt;STRUCT&lt;screen_name:STRING,name:STRING&gt;&gt;,
hashtags:ARRAY&lt;STRUCT&lt;text:STRING&gt;&gt;&gt;,
text STRING,
usr STRUCT&lt; screen_name:STRING, name:STRING, friends_count:INT, followers_count:INT, statuses_count:INT,
verified:BOOLEAN, utc_offset:STRING, -- was INT but nulls are strings time_zone:STRING&gt;,
in_reply_to_screen_name STRING,
yearint,
monthint,
dayint,
hourint
)
CREATE EXTERNAL TABLE time_zone_map (
time_zone string,
country string,
notes string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/user/data/time_zone_map';
…
create table tweets_sentiment stored as orc as select
id,
case
when sum( polarity ) &gt; 0 then 'positive'
when sum( polarity ) &lt; 0 then 'negative'
else 'neutral' end as sentiment
from l3 group by id;
-- put everything back together and re-number sentiment
CREATE TABLE tweetsbi
STORED AS ORC
AS
SELECT
t.*,
cases.sentiment
when 'positive' then 2
when 'neutral' then 1
when 'negative' then 0
end as sentiment
FROM tweets_clean t LEFT OUTER JOIN tweets_sentiment s on t.id = s.id.</p>
      <p>Execute script using command: Hive_f hiveddl.sql. Structured data are placed in Table 1.
2, m  positive _ value.</p>
      <p>The following metrics are used for data analysis. The total number of twits (Koli) for every location (R) is defined:</p>
      <p>N
KolR   ki , ki  R,</p>
      <p>i1
Where ki is the every following twit from processing thread.</p>
      <p>The unique word frequency ch(m) is defined from total collection L of text data:</p>
      <p>N
ch(m)   mi , mi  L.</p>
      <p>i1
Relationship of every twit otn (m,rez) can be defined from thesaurus tez, where relationship to every word is set:
0, m  negative _ value

otn(m, rez)  1, m  neutral _ value</p>
      <p>For further work we created a dictionary with filters of SB domain in order to identify the number of twits by location ch(m)
and number of twits by location with respect of relationship otn (m,rez) hereafter. We define thesaurus taking into account filter
with base metrics of small and medium-sized businesses: food, clothes, entertainment and kids. In conclusion we got 4 base
metrics of medium-sized business.</p>
      <p>Metric «food» P1 is calculated as number of twits in overall text data L:</p>
      <p>N
 Si (Si  P1 )</p>
      <p>L</p>
    </sec>
    <sec id="sec-4">
      <title>Results and discussion</title>
      <p>As a result it is possible to conclude what area of SB is especially in high demand in Samara Oblast. According to the Figure
3 it is apparent that the main strategy of SB promotion for authorities must be connected with opening of centers for children.</p>
      <p>Data Science / V.М. Ramzaev, I.N. Khaimovich, V.G. Chumak</p>
      <p>Due to BIG DATA technology it is possible to distribute and update data in «hаdoop» file system using filter “Samara Oblast”
(filter1= {Samara Oblast}). Then it is necessary to filter this area on base metrics of small and medium-sized businesses, setting up
for example the following metrics: Filter2 (food) = {cafe, bar, restaurant, cuisine*, beer*, meat, fish, tavern}; Filter3 (clothes)=
{coat, jacket, dres*, skir*, jacke*, bra*, stuf*}; Filter4 (entertainment)= {night club, concert, session, hangout}; Filter5 (kids) =
{kindergarten, baby-club, club}.</p>
      <p>A set of descriptors for filtering the Internet discourse will be determined by the lexical representatives of the concept formed in
the world building of the average Russian-speaking consumer of services. The main in the sphere of concepts "Food" is the
microsituation "Cooking", which includes the following cognitive and propositional structure: Subject - Predicate of cooking (how it is
cooked) - Object of cooking - The property of the cooking object - Method of cooking – Premises - Kitchenware – Appliances
Devices - Affair- Substance - Food / Dish – Food quality / Dish quality. In the situation of Internet communication, only the
structure elements relevant to the user are being explicated, the lexical interpretation of which let us draw a conclusion about the
needs of the residents of a particular district of Samara city. Building –up of block of descriptors on Filter3 (clothes); Filter4
(entertainment); Filter5 (kids) may be fulfilled according to the lexical and semantic fields “clothes”, “fashion”, associative and
semantic field “leisure”; concept “childhood”.</p>
      <p>For making decision in the area of SB it is necessary to create multimodal clusterization of social networks. The clusterization is
based on the method of Formal Concept Analysis (FCA). A large number of structured and unstructured data generate trivial data.
For example, the data of social websites in the SB area may be submitted in the form of following three items (user, group, interest)
(Fig. 4).</p>
      <p>Galois operator is defined in the following manner: for A  G, B  M
where A is the formal volume, В is the formal content.</p>
      <p>Formal notion is the pair ( A, B) : A  G, B  M , A '  B and B '  A. ,
def def
A' {m  M g / mg  A} , B ' {g  G g / mm  B} ,</p>
      <p>Notions ordered by ratio ( A1, B1 )  ( A2 , B2 )  A1  A2 (B2  B1 ) , from the complete lattice, called a context lattice  (G, M , I ).
The example of social network context in the SB area and their context lattice are shown in Table 2 and in Figure5.</p>
      <p>The use of this clusterization method permits to define the groups of interest, with increase of connections where it is required to
make managerial decisions. But this tool has restrictions of use. Users who work with social network Twitter are in the group of
“students” and partly in groups of “employees” and “workers” and slightly in group of “pensioners”, that is why it is necessary to
add field marketing research in these groups in order to take management decision.</p>
      <p>
        There is ability to get correlation between number of user requests with respect to filters and date and time of data collecting [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Time of data collecting from Internet using Big Data is not limited.
      </p>
      <p>As a result we get dynamic change of information in real time from Internet, which allows conduct monitoring of continuous
analysis of unstructured information by filters with minimal investments (In-Memory Data Processing and Stream technology). For
the purpose of this method implementation we coded a program using Scala language:
val file = spark.textFile(“hdfs://… “)
val errors=file.filter(line=&gt;line.contains(“Samara Oblast“))
//count all the data
errors.count()
//count data mentioning Filter
errors.filter(line=&gt;line. contains(“meat“)).count()
//Fetch the filter as an array of string
errors.filter(line=&gt;line. contains(“food“)).collect()</p>
      <p>After program execution we got dynamic change of parameters in BIG DATA environment, that allow to identify zone of SM
business in geo region taking into account unstructured information. In case of consistent “peaks” detected in hit counts in
accordance with forms of business there should be supporting investment program take place for development of small and medium
sized businesses with a focus on certain business activity in target area.</p>
      <p>In conclusion we suggested a tool for increasing of budget funds usage effectiveness in geo region. This is the most important
challenge in modern economic reality, the solution for which is based on opportunity to take management decision in most optimal
way. Suggested approach of regulation can be efficient in innovative process management in developing of region economy typical
of lots of forms and wide range of factors, as well as dynamic progression and active transformation of daily living.</p>
      <p>Using of modern software and hardware allows conducting evaluation and visualization of changes in almost real time that can
be useful not only to local region governments but also to businesses in a way of design and implementation of investment projects.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Drovyannikov</surname>
            <given-names>VI</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khaymovich</surname>
            <given-names>IN</given-names>
          </string-name>
          .
          <article-title>Development of control pattern to manage the competitive improvement of social cluster of the region</article-title>
          .
          <source>Fundamental Studies</source>
          <year>2015</year>
          ;
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <fpage>822</fpage>
          -
          <lpage>827</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Drovyannikov</surname>
            <given-names>VI</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khaymovich</surname>
            <given-names>IN</given-names>
          </string-name>
          .
          <article-title>Simulation modelling of social cluster administration in Any Logic system</article-title>
          .
          <source>Fundamental Studies</source>
          <year>2015</year>
          ;
          <volume>8</volume>
          (
          <issue>2</issue>
          ):
          <fpage>361</fpage>
          -
          <lpage>366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Ramzaev</surname>
            <given-names>VМ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kukolnikova</surname>
            <given-names>EA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khaymovich</surname>
            <given-names>IN</given-names>
          </string-name>
          .
          <article-title>Development of a model for the functioning of production active elements in regional management</article-title>
          .
          <source>Bulletin of SSEU</source>
          <year>2014</year>
          ;
          <volume>12</volume>
          :
          <fpage>87</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Ramzaev</surname>
            <given-names>VМ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khaymovich</surname>
            <given-names>IN</given-names>
          </string-name>
          .
          <article-title>Integrated model of control over economic development of the region based on competitiveness improvement of the enterprises</article-title>
          .
          <source>Modern Issues of Science and Education</source>
          <year>2014</year>
          :
          <volume>6</volume>
          : 136 p.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ramzaev</surname>
            <given-names>VМ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khaymovich</surname>
            <given-names>IN</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chumak</surname>
            <given-names>VG</given-names>
          </string-name>
          .
          <article-title>Forecasting model of competitive growth for enterprises with energy modernization</article-title>
          .
          <source>Forecasting problems</source>
          <year>2015</year>
          ;
          <volume>1</volume>
          :
          <fpage>67</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Bonacich</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Power</surname>
          </string-name>
          and
          <article-title>Centrality: A Family of Measures</article-title>
          .
          <source>American Journal of Sociology</source>
          <year>2007</year>
          ;
          <volume>92</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1170</fpage>
          -
          <lpage>1182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Сhumak</surname>
            <given-names>PV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramzaev</surname>
            <given-names>VM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khaimovich</surname>
            <given-names>IN</given-names>
          </string-name>
          .
          <article-title>Models for forecasting the competitive growth of enterprises due to energy modernization</article-title>
          .
          <source>Studies on Russian Economic Development</source>
          <year>2015</year>
          ;
          <volume>26</volume>
          (
          <issue>1</issue>
          ):
          <fpage>49</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Сhumak</surname>
            <given-names>VG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramzaev</surname>
            <given-names>VM</given-names>
          </string-name>
          ,
          <source>Khaimovich IN. Challenges of Data Access in Economic Research based on Big Data Technology. CEUR Workshop Proceedings</source>
          <year>2015</year>
          ;
          <volume>1490</volume>
          ;
          <fpage>327</fpage>
          -
          <lpage>337</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Сhumak</surname>
            <given-names>VG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramzaev</surname>
            <given-names>VM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khaimovich</surname>
            <given-names>IN</given-names>
          </string-name>
          .
          <article-title>Use of Big Data technology in public and municipal management</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <year>2016</year>
          ;
          <volume>1638</volume>
          :
          <fpage>864</fpage>
          -
          <lpage>872</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Grechnikov</surname>
            <given-names>FV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khaimovich</surname>
            <given-names>AI</given-names>
          </string-name>
          .
          <article-title>Development of the requirements template for the information support system in the context of developing new materials involving Big Data</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <year>2015</year>
          ;
          <volume>1490</volume>
          :
          <fpage>364</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>