<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>May</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Data Performance Evaluation of Cloud Storage Providers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleksandar Dimov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stanimir Kirov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Mathematics and Informatics, Sofia University</institution>
          ,
          <addr-line>5 James Bourchier Blvd., Sofia, 1164, Bulgatia</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>2</volume>
      <fpage>7</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>Many of the current software systems are data-intensive which presents many new challenges not only to IT and to software professionals but also to business and individual users. Some of these challenges are related to decisions on how to store the data that data-intensive systems work with. One common solution is to use cloud storage, which most often is offered by third party. This paper presents a methodology for evaluation of cloud storage providers in the realm of data-intensive systems, based on the fundamental operations that are provided by their services. Further, it also makes a performance comparison of some of the popular cloud storage services in terms of the operations execution times.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data performance comparison</kwd>
        <kwd>cloud storage providers</kwd>
        <kwd>data-intensive systems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>An important concern in the realm of data-intensive systems is how users
and businesses are going to store their data. Both regular and businesses users are
increasingly credulous on cloud-based storage solutions instead of on-premises
local storage hardware. Most significant reasons for this include security, avail
ability, scalability and cost-efectiveness. More and more recognizable nowadays
is the tendency to migrate data to the cloud or to take seriously the ability to base
on the cloud when developing new solutions. In this sense, software engineers
and IT professionals are interested to have means for well-informed selection of
specific solutions, based on quality of service.</p>
      <p>Additionally, most of the contemporary systems are data-intensive [1], [2],
which means that they heavily rely on data storage and quality characteristics of
such storage. Such systems also often perform data analysis and analytical
processing which may be required to happen in real time. In these terms, it becomes
especially significant to optimize performance of such systems.</p>
      <p>However, in current environment, it may become dificult to select appropri
ate cloud storage provider, as there exist a lot such services. Users need means to
select the best option in a straightforward way. One of the first things someone
should do when choosing between cloud services is compare storage options,
features, and costs. Next, it is the dependence on a single vendor for so many critical
needs. If your data is in the hands of one service provider, the dependence on
your provider is huge. To avoid this, users may implement multi-cloud
architecture. By using multi-cloud storage connection tool, one can easily switch between
cloud service providers that are supported by the tool.</p>
      <p>The goal of this paper is to provide a methodological framework for testing
of cloud storage providers and show particular results on some of the most
popular free storage services. The research question employed by this study is “What
are the main factors that users should employ to evaluate cloud storage solutions
and how to pick provider that is right for their needs?”.</p>
      <p>The rest of the paper is structured as follows: Section 2 makes an overview of
the related work in the area; Section 3 presents the methodological framework of
our approach to testing of cloud storage providers; Section 4 describes the
specifics of the testing environment and experiments, we have made; Section 5 presents
and analyse the results and finally, section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>There exist a number of research works that directly relate to our and aim at
performance comparison of cloud storage providers.</p>
      <p>Like [5], where a comparison between Google Cloud Service and iCloud is
made by exploration of the features of these two cloud storage services.</p>
      <p>In [6], the authors have tested performance of several cloud storage
providers including Google Drive and Dropbox and have analysed their applicability in
healthcare services by using medical image files for testing and comparison. Like
what is shown in this paper, a comparison was based on time duration of several
commands, including upload, download, and file deletion.</p>
      <p>Another comparison of some popular cloud storage services is provided in
[7]. The authors aim to help users to choose the right cloud service for storage by
making a comparison on 10 diferent factors, including performance. It is evalu
ated by upload and download of files of two diferent sizes.</p>
      <p>There also exist several non-academic surveys [3], [4] that try to rate cloud
storage providers, however they do not focus on methodological approach to
testing but rather just compare the properties of diferent plans that cloud storage
providers ofer.</p>
      <p>Another direction of research that have some relation to our work concern
performance testing of various cloud services. Like in [8], where
High-Performance Computing (HPC) is evaluated in terms of performance comparison of
Google services Cloud Functions (Function-as-a-service) and Compute Engine
(Infrastructure-as-a-Service).</p>
      <p>In conclusion, there exist a lot of work in comparison of cloud services and
cloud storage in particular. However, in this paper we are trying to fill the gap in
relation to the cloud evaluation with respect to data-intensive systems. For this
purpose, in next section we present our methodology for testing performance,
which is specifically targeted at storage service operations.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Comparison methodology</title>
      <p>This section will explain the methodological approach for comparison
between diferent cloud storage providers.</p>
      <p>The test environment should be fully isolated from other applications, in
order to prevent data interference. An additional application is also needed to
provide a bridge between the test environment and cloud providers under test. It
will also serve as a wrapper that will allow access to diferent cloud providers and
provide the same and fair conditions for all of them.</p>
      <p>We will perform the test following three main phases:
1. pre-test phase – a share is created, which is going to be used in the test
phase to check the performance of cloud data storage providers
2. test execution phase – this phase consists of execution of 9 operations
common for each operating system and execution time is measured for each
of them. These operations are the following:</p>
      <p>a) Create share – this operation is used to create a location for storing
files;</p>
      <p>b) List share – this operation is used to show files in the
share\directory listed;</p>
      <p>c) Move share – this operation is used to move a directory and
subdirectories (if available) and files within the share;</p>
      <p>d) Copy share – this operation is used to copy a directory and
subdirectories (if available) and files within the share;</p>
      <p>e) Delete share – this operation is used to delete a directory and
subdirectories (if available) and files within the share;</p>
      <p>f) Upload file – this operation is used to transfer data from source
(computer\PC) to destination (cloud share in this project case);</p>
      <p>g) Download file – this operation is used to transfer data from source
(cloud share) to destination (computer\PC);
h) Copy file – this operation is used to copy files;
i) Delete file – This operation is used to remove a file from the file
system in Create share.
3. post-test phase – this phase has the duty to prepare for the next iteration
of the test execution phase. It includes cleaning the test file that that was
created during the previous phase. This is needed since free accounts are
used that have limited storage space.</p>
      <p>Testing of a cloud storage provider should be performed while treating it as a
black box. Normally, one should not be able to get any kind of internal
information for cloud architecture infrastructure as this is considered as security breach
and if that happens the cloud infrastructure could be classified as highly unreli
able. This way, we are going to use opaque testing technique. With this technique,
only the fundamental aspects of the system are being explored. In that way, more
data may be collected and conclusions can be very accurate regarding diferent
cloud storage vendor’s behaviour and response according to our setup.</p>
      <p>In order to perform the test, we should ensure the following requirements
that are supposed to the fairest test conditions:
1. Single platform or application should be used to access diferent cloud
storage providers.
2. Virtualization should be used, which is limited to a single virtual
machine. This will provide an isolated environment and is a safe, eficient, cheap
and flexible way to test applications – one can test everything from server
configurations to resource allocation and most importantly for us – storage.
3. The operating system should be less demanding and have good handling
of resources so it can have less interference with the application and the test
results to be believed as accurate as they can.
4. It should be considered that cloud storage had diferent characteristics
for diferent uses (diferent end users or companies could make use of the
service in diferent ways). For this reason, we focused only on file-system
based operations and we will use a single application to access diferent
cloud solutions for storage service ofered by vendors.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>Building the testing environment</title>
      <p>We are going to use Rclone2 command-line tool as an intermediary
application between a client and cloud provider service. This way the integration is
provided between them. Rclone, is an instrument written in Go programming
language which is used to download\upload data from computer to a cloud hosted
data storage centre. It can connect to various cloud storage centres. This way, the
requirement for a single platform to have access to diferent cloud storage ser
vices ofered by vendors is going to be fulfilled.</p>
      <p>Another objective of using Rclone command-line tool is to produce
multiservice cloud delivery model. By developing and implementing it, we can compare
supported storage services from a performance perspective. The architecture of
the test environment built is shown on Figure 2.</p>
      <p>To provide virtualization, Oracle VirtualBox is used. It is a deceptively
simple, but powerful and free to use cross-platform virtualization application for x86
hardware, targeted at server, desktop and embedded use [5].</p>
      <p>As an operating system the CentOS Linux distribution was used, as it is a
stable, predictable, manageable, and reproducible platform derived from the sources
of Red Hat Enterprise Linux [6], [7]. It is available free of charge and technical
support is primarily provided by the community via oficial mailing lists, web fo
rums, and chat rooms. Other reasons for it to be chosen for our work is that it has
good documentation; it is highly customizable and is supported by Virtualbox.</p>
      <p>As defined in the methodology description in Section 3, we have to imple
ment the operations that are most used on storage. In the list below each operation
is shown together with the specific Rclone command that was used to execute it:</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiment results and analysis</title>
      <p>This section presents the results of performance comparison of cloud service
providers. After presenting particular and average times for execution of each
command listed in previous section, we also make some analysis of the diferent
providers based also on their pricing plans. To perform the test a single 1GB file
with randomly generated contents is used.</p>
      <p>All times shown in the tables and figures in this section, as well as in the ap
pendixare in the format minutes:seconds (mm:ss).</p>
      <p>The experiment described in this section was undertaken under two
important assumptions:
• We are going to test only free services delivered by cloud storage
providers. This is an important assumption, because given cloud service provider
may limit the resources available to their free tier services, while increasing
or removing the said limit for the paid plans.
• Analysis of pricing plans of cloud storage providers has been made only
about per month plans of each provider and for personal users only. It is
important, because many providers may ofer additional services on top of
storage, which may influence the price of storage. Cloud providers also ofer
additional subscriptions, like annual ones, family plans, business, and
enterprise plans, etc., which may vary significantly in terms of pricing.</p>
      <sec id="sec-5-1">
        <title>5.1. Test Results</title>
        <p>Results of the tests performed given the environment and methodology,
described in section 4 are shown in Table 1.</p>
        <p>As seen from Table 1, the performance of the three compared cloud storage
providers is similar, with slight underperformance of OneDrive in
Share/directory operations (Figure 3). However, performance of all three providers in upload/
download is similar (Figure 4). More detailed table with test results is presented
in Appendix 1.
Operation
Create share</p>
        <p>List share
Move share
copy share
delete share</p>
        <p>Upload
Download</p>
        <p>Copy</p>
        <p>Delete
Duration of all tests
00:01.7
00:00.8
00:01.3
00:01.4
00:01.2
03:57.4
05:41.3
00:04.7
00:02.2
09:52.0
00:02.4
00:01.4
00:02.5
00:02.8
00:01.8
04:36.2
06:06.8
00:06.7
00:03.1
11:03.8
00:01.3
00:00.9
00:02.0
00:01.4
00:01.4
04:32.7
02:30.1
00:02.9
00:01.7
07:14.3</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Pricing plans evaluation</title>
        <p>All cloud storage providers have consumer storage plans and support difer
ent storage plans for business. Here we are going to focus on consumer storage
plans. Please note that all prices refer to individual accounts and they are not
options for businesses. Also depending on the plan every provider gives you bonus
features that are not part of our research.</p>
        <p>This research shows that, the pricing plans of tested cloud storage providers
are almost the same. However, it should be noted that Google Drive ofers the
Storage
15 GB
30 GB
2 TB
5 TB</p>
        <p>Free
6$
12$
18$</p>
        <p>Storage
largest storage space in their free plan. It is also one of the most generous cloud
storage providers with their plans even if the free plan of the storage is shared
between diferent services that they ofer.</p>
        <p>At first glance, Dropbox probably have the best pricing ofers for bigger
storage needs and ofer the best price per space ratio. However, it should also
be noted that most providers, together with the storage ofer users also a large
number of other services as well. This requires a more complex methodology and
criteria for pricing comparison of cloud storage providers.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In terms of data-intensive systems, it is worth to be able to evaluate difer
ent storage options available for small business and individual users. In
contemporary systems, most data is stored over the cloud using the services of difer
ent cloud storage providers. This paper presents a methodological framework
to evaluate cloud storage providers in terms of their performance parameters. It
also presents details on specific testing environment and results from testing the
performance of three popular cloud providers that also ofer free storage options.
Additionally, a comparison of the pricing plans of these providers is performed;
however, it is dificult to assess them in this respect, as most subscriptions include
other service, besides storage.</p>
      <p>It should be noted that a certain drawback of cloud solutions is represented
by bandwidth limitations and the end user network is very important part of the
cloud service. If the network is slow and unstable it may trouble accessing or
sharing files and even, make impossible to work on this kind of environment.
However, investigation on how end user network afect performance of cloud
storage providers is part of our further research.</p>
      <p>Directions for future research include:
• Increasing the comparison with more service providers
• Development of methodology for comparison of other quality
characteristics of cloud storage providers like reliability, availability, security and
cost-efectiveness. It may also appear beneficial to define a compound mea
sure for cloud storage quality of service, by combining the results of the
various tests of such characteristics.
• The experiment may be expanded to include more diverse tests, for
example with various file sizes, single transaction with large number of files
(both small and large ones), and etc.
7.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>Research presented in this paper is partially supported by the Sofia Uni
versity “St. Kliment Ohridski” Research Science Fund project No.
80-10145/23.05.2022 – “Data intensive software architectures”.</p>
      <p>Authors of the paper are also grateful to the anonymous reviewers for their
valuable comments and remarks, which helped to increase the quality of the paper.</p>
    </sec>
    <sec id="sec-8">
      <title>Appendix: Detailed results of performance comparison</title>
      <p>73</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Kleppmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Designing Data-Intensive Applications. O'Reilly</article-title>
          . Beijing.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Hey</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tansley</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tolle</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Jim Gray on eScience: a transformed scientific method</article-title>
          .
          <source>The Fourth Paradigm.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>Best cloud storage providers</source>
          <year>2021</year>
          , available at: https://cloudstorageinfo. org/top-10
          <string-name>
            <surname>-</surname>
          </string-name>
          cloud
          <article-title>-storage-providers.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Arif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajjdiab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Harbi</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazal</surname>
          </string-name>
          , “
          <article-title>A Comparison between Google Cloud Service</article-title>
          and iCloud,”
          <source>2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>340</lpage>
          , doi: 10.1109/CCOMS.
          <year>2019</year>
          .
          <volume>8821744</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Performance Evaluation and Comparison of Various Personal Cloud Storage Services for Healthcare Images</article-title>
          . In: Tavares,
          <string-name>
            <given-names>J.M.R.S.</given-names>
            ,
            <surname>Dutta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dutta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Samanta</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds)
          <source>Cyber Intelligence and Information Retrieval. Lecture Notes in Networks and Systems</source>
          , vol
          <volume>291</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Zenuni</surname>
            , Xhemal &amp; Ajdari, Jaumin &amp; Ismaili, Florie &amp; Raufi,
            <given-names>Bujar.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Cloud storage providers: A comparison review and evaluation</article-title>
          .
          <volume>883</volume>
          .
          <fpage>272</fpage>
          -
          <lpage>277</lpage>
          .
          <fpage>10</fpage>
          .1145/2659532.2659609.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Malla</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Christensen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>HPC in the cloud: Performance comparison of function as a service (FaaS) vs infrastructure as a service (IaaS)</article-title>
          .
          <source>Internet Technology Letters</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <year>e137</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>