<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hybrid modeling applications for distributed information systems scaling tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrey A. Zenzinov</string-name>
          <email>andrey.zenzinov@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg A. Abankin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Junior research worker, Institute of mechanics, Moscow State University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Student, Department of Mechanics and Mathematics, Moscow State University</institution>
        </aff>
      </contrib-group>
      <fpage>71</fpage>
      <lpage>77</lpage>
      <abstract>
        <p>The development and distribution of information systems generally is accompanied by the increase in their architecture complexity and by the appearing of new requirements for software and resources. Modeling is an essential part of research on possible development directions of such systems. Developers of productive information analysis systems (IAS) have to scale them sooner or later. Such need generally is caused by the increasing system complexity and the increasing users quantity which are geographically distributed. This leads to the research on scaling possibilities assessment and discovering optimal solutions in terms of ref source costs. The Intelligent System for Thematical Analysis of Scientometric Data (russian abbreviation - ISTINA, for short in this paper - the System)[1] is a functional CRISsystem in Lomonosov Moscow State University (MSU). Within a pilot project, the System is being implemented in several scientific research institutes subordinated to the Federal Agency for Scientific Organizations (FASO) and distributed throughout Russia. In this regard there is a need of applying infrastructure development decisions aimed to satisfy new availability requirements caused by increasing number of geographically distributed users. The large distance between users and the System leads to increased data transmission delays in comparison with System usage in MSU. In this paper, taking into account System usage in new organizations, we consider three following options: • “cloud” mode, web-servers and database (DB) servers are located in a single data center (DC); • web-servers are located in geographically distributed organizations while DBservers are located in a single DC; • web-servers and DB-servers are located in several geographically distributed DCs. Testing and verification of developing architectural and software solutions on the productive System cause a number of difficulties containing the following:</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>• interaction between the System and the other functional information systems in
organizations should be taken into account;
• productive System modification is difficult and accompanied by significant costs;
• distributed IAS consists of several components and it is necessary to consider their
interaction and failure possibilities.</p>
      <p>One of the common approaches described in similar studies is modeling of target
systems. Such approach can be used to reduce costs and to assess the project solutions
feasibility. The usage of hybrid modeling way as a conjunction of analytical,
fullscale, virtual modeling and simulation, allows investigating the System behavior in
various configurations, taking into account software and network infrastructure
features.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Problem statement</title>
      <p>The main purpose of this paper is to demonstrate the hybrid modeling applicability to
complex distributed IAS simulation for solving scalability costs assessment tasks. By
the scalability in this study we assume the possibility of architectural modifications
conduction in order to provide desirable quality of service (QoS) in the context of
involvement of new organizations which are using the System. These organizations
can be located at a great distance from each other and their data can be logically
distributed (e.g. different websites for different organizations). On the one hand the
increase in users number leads to the increasing number of requests to the System. On
the other hand the big distance between users and servers will cause a communication
quality degradation.</p>
      <p>Client organizations may establish the requirement of the System interaction with
other information systems operating in them, including systems that process personal
data. This may lead to the addition of new requirements for communication channels
or requirement to place a part of the System servers in the same separate DC that
hosts other servers for this organization. While the data related to scientific and
pedagogical activity of different organizations personnel can be closely related to each
other, there is a problem of DB-servers separation between organizations and
choosing the data replication strategy. Latter can make a significant impact on
development decisions but it is not defined for IAS “ISTINA” so far and this issue is outside
the scope of this study.</p>
      <p>A target problem of the System scalability assessment is investigation of
dependence of objective (such as web page loading time) and subjective (QoS perception)
user scores of the System web pages performance from network infrastructure
parameters: network latency and packet loss rate.</p>
      <p>
        Mean Opinion Score (MOS) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is the subjective user score, which values are
logarithmically dependent on the network parameters [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This parameter will be
thoroughly described in “QoS parameters” section.
      </p>
      <p>The solution to the described problem may help to draw out some
recommendations for the System architectural scaling.</p>
      <p>Hybrid modeling can be applied in other development areas of the System such as
following: decision-making on the project infrastructure development; systematic
performance testing based on realistic scenarios with realistic infrastructure;
continuous integration (automatic testing). Consideration of these approaches is out of this
paper scope.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Description of the distributed IAS modeling system</title>
      <p>In the context of this paper distributed information system model consists of three
general components: nodes; network infrastructure; software installed on the nodes.</p>
      <p>
        There are several options for nodes modeling: a physical machine (real hardware
and software); a virtual machine (real software, emulated hardware); a logical node in
simulation (each node have its own representation in network without hard- and
software modeling); aggregate nodes behavior modeling (nodes are logically
indistinguishable). Like the nodes networks can be represented in different versions: physical
network (real network hardware); virtual networks (NAT, network bridge, isolated
network); simulated networks (simulated via NS-3 network simulator [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). The entire
network infrastructure can be composed of parts modeled in different ways.
      </p>
      <p>The hybrid modeling approach allows a combination of these variants in different
proportions depending on the modeling purposes and providing the necessary detail
level.</p>
      <p>
        User behavior simulation is carried out using the Apache JMeter application to
load test functional behavior and measure web applications performance [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. JMeter
allows us to simulate scenarios consisting of different realistic HTTP-requests to the
System from larger number of users.
      </p>
      <p>
        The distributed system model deployment is carried out with special virtual
deployment facilities developed at MSU. This software provides an ability to deploy a
distributed system model in automatic mode with given nodes and networks
specification. Such approach is convenient for dealing with frequent configuration changes.
More details are described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], although since the publication of these
articles the functionality of the developed tools has been significantly expanded.
      </p>
      <p>The System model from an architecture perspective is represented by one or
several web-servers and a DB-server. The System is driven by Django framework and due
to features of its ORM (Object-Relational Mapping) technology implementation
features, characterized by a large number of queries to DB, the main traffic passes
between the web servers and the DB. Thus, in the current study network characteristics
variation is modeled in this internal part of IAS network infrastructure.
4</p>
    </sec>
    <sec id="sec-4">
      <title>QoS parameters</title>
      <p>
        The Quality of Experience (QoE) parameter is commonly used to describe the
parameters of the user’s perception of the service. MOS – an integer value from 1 (user
stops using the service) to 5 (full satisfaction with the service quality) – is a
convenient metric to describe QoE. QoS parameter is also used for this purpose and
designates measurable service parameters such as packet transmission delay and service
response time. The MOS dependence on QoS is described in many studies [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8-11</xref>
        ].
MOS is a subjective parameter because it depends not only on QoS parameters but
also on unmeasurable parameters too: user expectations; service type; usage purposes.
      </p>
      <p>One of the most common scenario of the web service usage is web-surfing and
search usage. Web-page loading time is the key characteristic for the quality
assessment for this scenario and it can be classified as QoS parameter. User expectations
formed before starting the use of service have a significant impact on user perceptions
of the web-page performance. If the user expects 1 minute response but in fact
response comes faster then the user takes this fact positively.</p>
      <p>
        Jacob Nielsen, one of the most known user experience specialist, emphasizes three
main response-time limits in his studies of websites user experience [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]:
instantaneous response (0.1 s., feeling of direct manipulation, user doesn’t notice waiting time);
seamless load (1 s., user can sense a delay but feels comfortable, from 1 to 10 s. user
wishes the service was faster); lost attention (10 s., user start thinking about other
things). Such categorization assumes the highest user expectations on service
performance. If he expects the time-consuming response his perception score will be
different.
      </p>
      <p>The attention loss parameter is determined by the type of using service as well as
by the tasks to solve. For example, for information searching tasks response time is
the important criterion while for analytic tasks such as report preparation it isn’t.
Web-page loading time cannot be a universal criterion for the QoE evaluation, since
the web-page content in most tasks is more important for user perception but for
report creation the essential part is the final result.</p>
      <p>
        The dependence of user perception of service performance (in terms of MOS) on
web-page loading time is depicted in Fig.1 taken from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
To determine the QoS numeric dependence on QoE the approach provided by
recommendations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is used. QoS parameter is described by session time, and QoE – by
MOS. This document considers a scenario for searching information on the Internet,
that consists of the following steps: a search page request; search query; search results
displaying. For this scenario the following logarithmic dependence of MOS on the
session time is valid:
⁄
⋅
5
,
where Min and Max – minimum and maximum session time obtained during
experiments, respectively.
      </p>
      <p>
        Correlation of this statement with the subjective users score depends on the users
expectations. According to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for 60 s. waiting time an achieved correlation was 0.95
and for less than 6 s. waiting time – 0.72.
      </p>
      <p>Thus, the obtained formula evaluates QoE on QoS dependence with a high precision
for 60 s. waiting time and with a medium precision for less than 60 s. waiting time.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>In this section a scenario of information search and browsing the System web-pages is
considered and it consists of the following steps: signing in; searching worker in the
System by his name; browsing personal web-page of worker found; browsing a
journal web-page; signing out. According to statistics from the Google Analytics these
web-pages are the most visitable and due to the nature of this scenario we can use the
MOS evaluation approach described in the previous section.</p>
      <p>
        The System model is implemented using the described modeling system. The
webserver and DB-server nodes are represented by virtual machines, network
infrastructure is simulated by NS-3 and Netem (Traffic Control) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In the first case the
System network infrastructure is completely modeled by NS-3, in the last one – virtual
networks are used and network characteristics are set by Netem.
      </p>
      <p>The session time for this experiment will be considered equal to the total execution
time of all scenario steps. Users expectations were calculated as a session time with
no latency and no packet loss. The expectation of worker web-page loading time is
about 35 s. with the given parameters.
The measurements were performed for variation of latency from 0 to 12 ms. and for
the packet loss variation from 0 to 10%. Experiment results showed that the session
time is linearly depended on latency for each of the network simulation approach. In
the case of NS-3 the simulated networks session time varied from 35 s. (0 ms. delay,
0% packet loss) to 100 s. (12 ms., 0%). These results are illustrated in Fig 2. In the
case of using Netem – from 40 s. (0 ms., 0%) to 120 s. (12 ms., 0%). The linear
nature of this dependence can be explained by the fact that during traffic processing a
consequently processing of operations is performed. Packet loss variation doesn’t
affects the session time linearly. With given 0 ms. delay session time varied from 35
s. to 95 s.(10% packet loss) in the case of NS-3 driven networks (see Fig. 2) and from
40 s. to 240 s for Netem tuned networks. This non-linear nature can be explained by
the features of TCP-protocol realization.</p>
      <p>MOS values were calculated with the provided formula and for the delay variation
the results are following (see Fig. 3): 5 (0 ms. delay, 0% packet loss), 4.4 (2 ms., 0%);
3.4 (4 ms., 0%), 2.5 (6 ms., 0%), 1.9 (8 ms., 0%), 1 (11 ms., 0%). MOS values for
packet loss variation: 5 (0 ms., 0%), 4.4 (0 ms., 1%), 3.6 (0 ms., 2%), 3.1 (0 ms., 3%),
2.6 (0 ms., 4%), 2.1 (0 ms., 5%), 1.7 (0 ms., 6%), 1.4 (0 ms., 7%), 1 (0 ms., 8%).
Taking into account the obtained results we can formulate recommendations for the
required network characteristics of the network section between the web-server and
DB-server. Given that the satisfactory MOS value is 3, it is recommended to provide
the connection quality with the network delay below 4 ms. and the packet loss rate
below 3%. The System web-servers thus should be placed in the same DC with
DBservers if possible. Placing a web-server at a significant distance from the DB-servers
will lead to severe delays.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>This paper describes the distributed IAS hybrid modeling approach and demonstrates
its possible applications for various scalability assessment tasks and for other tasks
arising during the complex systems development.</p>
      <p>Current work also provides an example of proposing recommended requirements
for the network characteristics of network infrastructure of the System in the context
of its distribution for new client organizations. The dependence of session time on
network parameters such as latency and packet loss rate was obtained and analyzed.</p>
      <p>The results provided in this paper can be useful for IAS “ISTINA” development
processes. In particular, the obtained nature of the session time dependence on the
network delay indicates that significant distances between the web-servers and
DBservers (e.g. if they are placed in different remote DCs) will cause a significant user
characteristics decrease.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>1. The Intelligent System for Thematical Analysis of Scientometric Data (ISTINA) / V.A</article-title>
          .
          <string-name>
            <surname>Sadovnichiy</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          <string-name>
            <surname>Afonin</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          <string-name>
            <surname>Bakhtin</surname>
          </string-name>
          et al. - Moscow University Publishing,
          <year>2014</year>
          . - P.
          <year>262</year>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Mean opinion score. // URL https://en.wikipedia.org/wiki/Mean_opinion_
          <source>score (request date 30.06</source>
          .
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>ITU-T G</surname>
          </string-name>
          .
          <volume>1030</volume>
          :
          <string-name>
            <surname>Estimating</surname>
          </string-name>
          end
          <article-title>-to-end performance in IP networks for data applications</article-title>
          // URL https://www.itu.int/rec/T-REC-G.1030/en (request
          <source>date 30.06</source>
          .
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. What is NS-
          <volume>3</volume>
          ? // URL: https://www.nsnam.org/overview/what-is-ns-3/ (request date:
          <volume>30</volume>
          .
          <fpage>06</fpage>
          .
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Apache JMeter // URL http://jmeter.apache.
          <source>org/ (request date 30.06</source>
          .
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Vasenin</surname>
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roganov</surname>
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zenzinov</surname>
            <given-names>A.A.</given-names>
          </string-name>
          <article-title>An environment for the study of the information sercurity facilities in grid and cloud systems</article-title>
          . // Program engineering. -
          <source>2014</source>
          . - No 3. - P.
          <fpage>21</fpage>
          -
          <lpage>33</lpage>
          (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Zenzinov</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <source>Automated deployment of virtualization-based research models of distributed computer systems // Proceedings of the 7th Spring/Summer Young Researchers' Colloquium on Software Engineering (SYRCoSE</source>
          <year>2013</year>
          ). - National Research Technical University Kazan, Russia: Kazan,
          <year>2013</year>
          . - P.
          <fpage>128</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Daisuke</given-names>
            <surname>Yamauchi</surname>
          </string-name>
          and
          <article-title>Yoshihiro Ito A METHOD OF EVALUATING EFFECT OF QOS DEGRADATION ON MULTIDIMENSIONAL QOE OF WEB SERVICE WITH ISO -</article-title>
          BASED USABILITY // International Journal of Computer Networks &amp;
          <string-name>
            <surname>Communications(IJCNC)</surname>
          </string-name>
          . - Vol.
          <volume>7</volume>
          , No.1,
          <string-name>
            <surname>January</surname>
            <given-names>2015</given-names>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>ITU-T Nazrul Islam Vijaya John David Elepe On Factors Affecting Web Browsing-QoE Over</surname>
          </string-name>
          Time // Master Thesis Electrical Engineering Thesis No:
          <fpage>MEE100038</fpage>
          .
          <article-title>- January 2014</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Canale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Delli Priscoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Monaco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Palagi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Suraci</surname>
          </string-name>
          <article-title>A reinforcement learning approach for QoS/QoE</article-title>
          model identification // Control Conference (CCC),
          <year>2015</year>
          34th Chinese.
          <article-title>- July 2015</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Markus</surname>
            <given-names>Fiedler</given-names>
          </string-name>
          ,
          <article-title>Phuoc Tran-Gia A Generic Quantitative Relationship between Quality of Experience and</article-title>
          Quality of Service // IEEE Network ( Volume:
          <volume>24</volume>
          , Issue: 2,
          <string-name>
            <surname>March-</surname>
          </string-name>
          April
          <year>2010</year>
          ).
          <article-title>- March 2010</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Jakob</given-names>
            <surname>Nielsen</surname>
          </string-name>
          .
          <source>Website Response Times</source>
          .
          <year>2010</year>
          // URL: https://www.nngroup.com/ articles/website-response-times/ (request date:
          <volume>30</volume>
          .
          <fpage>06</fpage>
          .
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. networking:netem // Linux Foundation Wiki // URL: https://wiki.linuxfoundation.org/ networking/netem (request date:
          <volume>30</volume>
          .
          <fpage>06</fpage>
          .
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>