<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reconstructing an Arti cial Society on the basis of Big Open Data</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Central Economics and Mathematics Institute Russian Academy of Sciences</institution>
          ,
          <addr-line>Moscow, Russian Federation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Orel State University</institution>
          ,
          <addr-line>Orel, Russian Federation</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we integrate big data into the computer model of an arti cial society. The model is agent-based and consists of several modules, representing demographic, economic, nancial processes, employment and consumption, educational and administrative institutions. In order to create an arti cial society that would simulate Russian Federation in 2014 year we use big open data, including Federal State Statistics Service yearbooks and o cial information on the websites of the ministries. Algorithm of an arti cial society reconstruction includes creation of agents and organizations; distributing them among geographical regions; setting interrelations between agents, households and organizations. For the veri cation of the arti cial society model we propose a DES-analysis method, which compares demographic, economic and social indicators of the simulation output data with the real values of these indicators in the base year. We present statistical analysis of modeling results variation in retrospective period.</p>
      </abstract>
      <kwd-group>
        <kwd>Arti cial society Agent-based model Computational experiment Veri cation Big data Statistical analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In the practice of socio-economic systems' management, there is a need to
create and implement new methods and tools for making prognosis and planning.
Sustainable economic and technological growth requires development of
infrastructure, production capacities, human resources and living standards of the
population. To take into account multiple factors, it is necessary to analyze big
data, including both open statistical information, results of sociological surveys,
monitoring of federal programs, and private data from ministries, departments,
social networks and search systems. In order to use these data we need special
methods and tools. Particularly, we integrate big data analysis methods into
computer models of socio-economic processes.</p>
      <p>
        We have chosen agent-based modeling as a main method in this study, since
it allows to re ect dynamics of a macro-system as a result of the interaction of
microlevel objects. The concept of agent-based modeling was proposed in the
1990s [8] and since then has been widely applied in analysis of economic,
nancial, social and environmental processes [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7, 11, 16, 19, 12</xref>
        ]. Complexity of
agent-based models has risen along with advances in computing power and
information resources, resulting in larger models with complex interactions, whose
inputs require sophisticated analytical approaches. Similarly, the increasing use
of agent-based models data has further enhanced the complexity of their outputs
[10].
      </p>
      <p>
        The aim of our research is to construct an agent-based computer model of an
arti cial society, which re ects age and sex structure and regional resettlement
of population, composition of households, economic structures, administrative
and educational institutions. For the information content of the model, federal
statistical yearbooks and o cial information on the websites of the ministries of
the Russian Federation are used [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 3, 2</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Structure of the Model of an Arti cial Society</title>
      <p>The developed model includes 7 interconnected modules re ecting various
aspects of an arti cial society: Demographics, Education, Employment,
Production, Consumption, Financial System and Public Administration (see Fig. 1).
Each module corresponds to a set of information objects and events that change
their state.</p>
      <p>The module \Demographics" re ects birth, maturation and death of agents.
New households are formed after marriages and divorces. Agents can act as
labor, tax-payers, consumers, creditors and students, thus interacting with the
environment and with each other [13]. We set network connections between the
agents and determine closeness of the connection in the range from 0 to 1 using
the following rules: 0 { not acquainted; 0.1..0.3 { neighbors, colleagues; 0.4..0.6 {
friends; 0.7..0.9 { relatives. Relationships among agents are re ected in a square
matrix (Table 1). Closeness of the relationship 1 agent has only to himself
(the main diagonal of the matrix). Connections are transitive: if the rst agent
does not have direct connection (closeness is 0) with the third agent, but has
connection with the second agent (e.g. 0.5) and the second agent is connected
with the third agent (e.g. 0.8), closeness of the indirect connection between
the rst and the third agent will be equal to the product of the intermediate
connections (described in example 0.5 * 0.8 = 0.4). Indirect connections are
important for disseminating information process. Information about the agent
will be available for another agent, if the relationship between them (direct or
indirect) is higher than the threshold [14].</p>
      <p>Agents
A1
A2
: : :
Ai
: : :
An</p>
      <p>The model includes three types of organizations: commercial, nancial and
budgetary. Organizations interact with individual agents within hiring
employees, paying wage or ring them. Beyond that, educational organizations recruit
students, promote them to the following courses and graduate them (module
\Education"). Financial organizations give credits to agents and commercial
organizations and take deposits from them (module \Finance").</p>
      <p>Organizations permanently interact with counterparties within sales,
deliveries and nancial settlements for them; each operation is re ected in the
accounting, which is a simpli ed version of the system adopted in the Russian
Federation. Operations are written-o into the table \Accounting entries" as a
set of the following structure:</p>
      <p>&lt;Date, Deb acc, Deb start, Cred acc, Cred start, Sum, Deb n, Cred n&gt;,
where Date { date of the operation, Deb acc { debit account of the entry,
Deb start { value of the debit account before the operation, Cred acc { credit
account of the entry, Cred start { value of the credit account before the
operation, Sum { sum of the operation, Deb n { value of the debit account after the
operation, Cred n { value of the credit account after the operation [15].</p>
      <p>
        For reproducing economic dynamics in the model we implement an algorithm
of organizational decision making instead of a production function that is widely
used in macroeconomic models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [18]. Trading agents in each region compare
stocks of products that remained at the end of the year with the stocks that
were available at the beginning. Exceeding the volume of current stocks over
the initial level means an increase in demand for nal products compared to
the previous year; as a result volume of wholesale orders for the nal product
is growing. This causes a loop of positive feedback in the form of increased
production and supply of materials and components. In case of a shortage of
production capacity, enterprises implement investment programs, if necessary
attracting credit resources from nancial organizations. Decrease of demand for
nal products causes decrease of production and employment, which in turn
leads to a reduction in deliveries and investment programs.
      </p>
      <p>The public administration determines structure of the budget, taxation scale,
transfer payments, the interest rate and other parameters. Administrative
functions are implemented through educational, medical, social security and defense
budgetary organizations.</p>
      <p>At the current research stage we generate information objects of each module
and their standard functions, excluding procedures of decision making of agents
and organizations that determine dynamics of the system.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Initial Modeling Data Structure</title>
      <p>
        Initial modeling data that re ects socio-economic structures at the base year is
presented in Excel tables. The tables contain information on the demographic
structure of the population, organizations and their economic interrelations,
production, import, export, employment, nancial characteristics of organizations
and households, tax rates, transfer payments and other. Information content of
the tables is based on the collections of Federal State Statistics Service [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
AllRussian Population Census, Economic Development Ministry's reports, Bank of
Russia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Ministry of Finance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] open data. Table 2 presents the basic
tables of input data used in the creation of objects in di erent modules.
      </p>
      <p>Information presented in statistical resources requires preprocessing to match
ini-tial modeling data structure. For example, to reproduce sector structure of
economy in each region we need an aggregated table \Organizations". In the
o cial statistics sector structure of the economy, cross-sector interrelations,
export and import are presented in the input-output tables; regional production
structure is presented in the table \Gross added value of the regions by sectors
of the economy" in the statistical yearbook. Direct comparison of data of these
two tables is impossible for two reasons. First, calculation of the gross regional
product di ers from calculation of the gross domestic product, as a result of
which sum of gross regional product in all regions is less than the gross
domestic product. Secondly, information on regional production is presented in the
form of economic activity types, which implies less detail in comparison with the
sector structure (for example, 37 sectors are classi ed as one economic activity
type \manufacturing activities"). Thus, regional representation of production in
the model requires matching statistical data from di erent sources. Necessary
calculations are presented in steps:</p>
      <p>1. Calculation of share of each sector in the corresponding type of economic
activity on the basis of the input-output table.</p>
      <p>dsa = Vs=Va
(1)
dsa - share of sector s in economic activity type a, Vs - gross product of sector s,
calculated by method of added value, Va - gross product of economic activity type
a, calculated by method of added value; sector s belongs to economic activity
type a.</p>
      <p>2. Correction of the table of output of economic activities in regions, taking
into account di erence in the domestic product and total amount of regional
products in separate economic activities:
ka { correction coe cient of economic activity type a; Va { gross product of
economic activity type a; vr;a { product of economic activity type a in region r,
k
presented in statistical tables; vr;a { corrected product of economic activity type
a in region r.</p>
      <p>3. Completion of the table of output of sectors in regions:</p>
      <p>k
vs;r = vr;a ds;a
(4)
vs;r { product of sector s in region r; sector s belongs to economic activity type
a.</p>
      <p>
        Preprocessing of initial data is required for all modules except from
Demographics, since the recent information about composition of households in
each region is re ected in results of All-Russian Population Census [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Algorithm of Arti cial Society Reconstruction</title>
      <p>
        Reconstruction of an arti cial society is carried out in the base year of modeling.
The rst step is to set geographical structure of the Russian Federation (see Fig.
2). After that the original generation of agents is created, distributed among
households and resettled by regions; the composition of households is determined
by data of All-Russian Population Census of 2010 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. After that, the nancial
state of households is initialized.
      </p>
      <p>
        Organizations in the model are aggregated: one organization in the model
re-sponds to a set of organizations of one economic sector in the region.
Generation of organizations is based on the input-output table, which determines the
gross output of each economic sector, and the regional distribution of production
table [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. After generating organizations we set their type { commercial,
nancial or budgetary, and initialize values of their accounts. Economic interrelations
between organizations, including sales and logistics, are set by the rst quadrant
of the input-output table. Agents are distributed to workplaces in accordance
with their quali cations and employment structure in each economic sector.
      </p>
      <p>Educational institutions are associated with sets of educational places for
various groups of specialties and levels of education: school, secondary
professional educa-tion; bachelor's, master's or postgraduate courses. Agents of the
appropriate age are assigned to educational places.</p>
      <p>The generated arti cial society is stored in a database for later use in a series
of scenario calculations.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Program realization</title>
      <p>Model of an arti cial society was programmed on C# in Microsoft Visual Studio
2015, which is free available for scienti c research. Figure 3 shows the sequence
of data processing in the model. The initial modeling data is loaded in the
form of Excel tables, after that it is checked for completeness and consistency.
In the module of an arti cial society generation, the initial modeling data is
transformed to information objects of the model (agents, households and
organizations).</p>
      <p>Results of the generation procedure are stored in the model database, access
to them is provided by SQL-queries. The main resulting tables are accounting
entries of organizations, households and state administration. Thus, GDP is
calculated as sum of pro ts, wages and taxes paid by commercial, nancial and
budget organizations (we use method of calculating GDP by added value). Using
grouping queries, it is possible to present gross output in the context of sectors
and regions; ows of imports and exports in di erent sectors; household incomes
and expenditures; savings and credits of households and organizations.</p>
      <p>Interface of the software application of the model is presented in Figure
4 (\Start Modeling" button launches algorithms of dynamics that are being
developed at the moment).</p>
    </sec>
    <sec id="sec-6">
      <title>DES-analysis for Veri cation of the Model</title>
      <p>Veri cation of the arti cial society model was carried out in two stages. The
rst step is validation of algorithms through the test data set. At this stage,
the procedures for entering, conversion, storage, retrieval of data and algorithms
for generating model objects were checked. At the second stage, the model was
veri ed on the basis of retrospective data. For veri cation of the model on
retrospective data, we propose a method, which compares demographic, economic
and social indicators of the simulation output data with the real values of these
indicators (DES-analysis).</p>
      <p>The verifying simulation was carried on a time period of one year (12 clock
ticks). Population of the Russian Federation was represented by 1.5 million
agents, that is, one agent in the model corresponds to 100 residents; for the
convenience of further interpretation of the output data, at the end of
modeling inverse scaling was performed. Taking into account the accepted assumption
that one organization in the model responds to a set of organizations of one
economic sector in the region, about 4.5 thousand organizations were created in
the model (58 sectors in input-output tables and 90 regions, some sectors are
not presented in certain regions). We conducted a series of 10 experiments; the
following Table 3 gives a summary of veri cation results.</p>
      <p>
        The set of demographic indicators for comparison includes population
number by sex and age groups, number and composition of households. As
economic indicators we have chosen GDP, gross output, imports and exports of
economic sectors. Social indicators are the average wage, number of unemployed
and number of people below the poverty line. Variation of the observed
parameters is determined by stochastic elements of generation procedures, including
age of agents, composition of households, deviation of wage values in di erent
economic sectors and regions from the average value. To analyze the adequacy
of the model, the output of the generation procedures was averaged over 10 runs
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [9].
      </p>
      <p>Statistical analysis of modeling results for the base year (2014) showed
deviation of the selected parameters within 1%, excluding GDP, which variation is
about 3%. It is connected mostly with simplifying the tax system in the model:
for organizations we set 13% rate, while some sectors of the economy pay
additional taxes and it makes signi cant contribution in GDP. Taking this
assumption into consideration, we can conclude that calculated variation of modeling
parameters indicates a su cient accuracy of arti cial society reconstruction for
the base year. However, we should take into consideration that models
performing well on data sets available at the time of their publication might perform
less well or badly when applied to post-publication data [17]. Due to this reason
we plan to verify the model at a few input datasets (years 2015-2017) after they
become available in o cial statistics.</p>
      <p>For a short period of one year, we estimate deviation of modeling results from
retrospective values, which is su cient for assessing similarity of real society and
reproduced arti cial society for the base year of modeling. Within calibration
of the model on longer time series (at least 5 years), it would be possible to
determine autocorrelation and mutual in uence of various factors using methods
of regression analysis.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>The computer model of an arti cial society is designed for reproducing
geographical distribution of population of the Russian Federation and its current
socio-economic situation. In this paper we describe structure of the initial
parameters of the simulation and the procedure for their veri cation. Since the
aim of our research is forecasting economic development of Russia and assessing
the impact of the state economic policy on this process in our future work we
are going to add algorithms of economic dynamics to the model and calibrate
them on retrospective data for 2014-2017 period. After o cial registration of
the program model, access to the source code would be available through Team
Foundation Server of Microsoft Visual Studio 2015. However, to reproduce
presented computational results, preprocessed initial data sets should be entered to
the model. In order to increase the accuracy of the prognosis we plan to specify
the initial modeling data by adding sociological surveys and results of social
networks monitoring, which would help to re ect subjective parameters and social
moods that are not obvious within standard statistical methods.</p>
      <p>Acknowledgments. The reported study was funded by RFBR according to
the research project 18-310-00185.
8. Epstein, J., Axtell, R.: Growing Arti cial Societies: Social Science From the Bottom</p>
      <p>Up. MIT Press, Brookings Institution (1996)
9. Fonoberova, M., Fonoberov, V.A., Mezi, I.: Global sensitivity/uncertainty
analysis for agent-based models. Reliability Engineering &amp; System
Safety 118, 8 { 17 (2013). https://doi.org/10.1016/j.ress.2013.04.004,
http://www.sciencedirect.com/science/article/pii/S0951832013000999
10. Lee, J.S., Filatova, T., Ligmann-Zielinska, A., Hassani-Mahmooei, B., Stonedahl,
F., Lorscheid, I., Voinov, A., Polhill, G., Sun, Z., Parker, D.: The complexities of
agent-based modeling output analysis. JASSS : the journal of arti cial societies
and social simulation 18(4) (2015)
11. Macy, M.W., Willer, R.: From factors to factors: Computational sociology and
agent-based modelling. Annual Review of Sociology 28, 143{166 (2002)
12. Makarov, V.L., Bakhtizin, A.R., Sushko, E.D., Vasenin, V.A., Borisov, V.A.,
Roganov, V.A.: Supercomputer technologies in social sciences: agent-oriented
demographic models. Herald of the Russian Academy of Sciences 86(3), 248{257
(2016)
13. Mashkova, A.L., Demidov, A.V., Savina, O.A., Koskin, A.V., Mashkov, E.A.:
Developing a complex model of experimental economy based on agent approach and
open government data in distributed information-computational environment. In:
eGose '17: Proceedings of the Internationsl Conference on Electronic Governance
and Open Society: Challenges in Eurasia. pp. 27{31. ACM international conference
proceedings series, ACM, New York, NY, USA (2017)
14. Mashkova, A.L., Novikova, E.V., Savina, O.A.: Agent model for evaluating in
uence of tax policy on political preferences. In: EGOSE '16: Proceedings of the
International Conference on Electronic Governance and Open Society: Challenges
in Eurasia. pp. 258{261. ACM, New York, USA (2016)
15. Mashkova, A.L., Savina, O.A.: Management of nancial ows of organizations in
the agent model of the experimental economy. Upravlencheskiy uchet 12, 89{98
(2015), (in Russian)
16. Mashkova, A.L., Savina, O.A., Lazarev, S.A.: Agent model for evaluating e ciency
of socially oriented federal programs. In: 11th IEEE International Conference on
Application of Information and Communication Technologies (AICT). vol. 2, pp.
217{221. Institute of Control Sciences of Russian Academy of Sciences, Moscow
(2017)
17. Moss, S.: Alternative approaches to the empirical validation of agent-based models.</p>
      <p>Journal of Arti cial Societies and Social Simulation 11(1), 5 (2008)
18. Ogibayashi, S., Takashima, K.: In uence of the corporation tax rate on gdp in
an agent-based arti cial economic system. In: Chen, S., Terano, T., Yamamoto,
R., Tai, C. (eds.) Advances in Computational Social Science. Agent-Based Social
Systems, vol. 11, pp. 147{161. Springer, Tokyo (2014)
19. Tesfatsion, L.: Agent-based computational economics: Growing economies from the
bottom up. Arti cial Life 8(1), 55{82 (2002)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>1. The Central Bank of the Russian Federation o cial site</article-title>
          , http://www.cbr.ru/eng/,
          <source>last accessed</source>
          <year>2018</year>
          /03/22
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <article-title>Ministry of Finance of the Russian Federation o cial site homepage</article-title>
          , http://old.min n.ru/en/statistics/, last accessed
          <year>2018</year>
          /03/18
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Russian Federation Federal State Statistics Service, http://www.gks.ru,
          <source>last accessed</source>
          <year>2018</year>
          /03/26
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bakhtizin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Agent-based models of economy</article-title>
          . Ekonomika, Moscow (
          <year>2008</year>
          ), (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Barros</surname>
          </string-name>
          , J.:
          <article-title>Agent-based models of geographical systems. chap. Exploring Urban Dynamics in Latin American Cities Using an Agent-Based Simulation Approach</article-title>
          ., pp.
          <volume>571</volume>
          {
          <fpage>589</fpage>
          . Springer, Dordrecht (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Baucells</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgonovo</surname>
          </string-name>
          , E.:
          <article-title>Invariant probabilistic sensitivity analysis</article-title>
          .
          <source>Management Science</source>
          (
          <volume>59</volume>
          (
          <issue>11</issue>
          )),
          <volume>25362549</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bonabeau</surname>
          </string-name>
          , E.:
          <article-title>Agent-based modeling: Methods and techniques for simulating human systems</article-title>
          .
          <source>Proceedings of the National Academy of Sciences 99(suppl 3)</source>
          ,
          <volume>7280</volume>
          {
          <fpage>7287</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>