<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic scalability of the OpenStack object storage</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Porunov</string-name>
          <email>alexandr.porunov@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>7 Chkalov Str.</institution>
          ,
          <addr-line>Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Key Terms. DataWarehousing, DataCloud, ServiceOrchestration, Technology</institution>
          ,
          <addr-line>Object</addr-line>
        </aff>
      </contrib-group>
      <fpage>55</fpage>
      <lpage>62</lpage>
      <abstract>
        <p>Modern big web-services should be developed with scalability and high availability. Modern high-load projects must cope with the loss of a server, rack of servers, data center or several data centers. It is not acceptable when a big business stops because of server overload or loss of any of the network elements. Also, small businesses, which aim is to grow in the near future, have to design their architecture to be easily scalable. The most non-trivial task is to construct a data warehouse because it is the stateful service and lots of servers need to be managed to have a storage which is big enough to store all users' data. The paper describes methods which might be taken to scale OpenStack object storage. Particular attention will be drawn to the automation of the object storage scalability. The solutions of scaling OpenStack Swift are suggested.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Storage</kwd>
        <kwd>Swift</kwd>
        <kwd>OpenStack</kwd>
        <kwd>Scalable</kwd>
        <kwd>Automatic</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Object storage takes a significant part in IT. Data is continuously growing that is
why a scalable solution is required to store new data. OpenStack Swift is one of the
most popular open source object storages. It has been designed to fit lots of data and
to be scalable as much as possible. OpenStack Swift is highly available because it
stores replicas of data as far as possible from each other [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To describe its nodes
Swift uses ring files which contain all the information about devices in the cluster [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Data would be in danger without security. Thus OpenStack Swift supports
OpenStack Keystone out of the box.</p>
      <p>
        OpenStack Keystone is an OpenStack service that provides API client
authentication, service discovery, and distributed authorization by implementing OpenStack
Identity API [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Keystone is able to use either LDAP or SQL server as a backend.
      </p>
      <p>Almost always OpenStack Keystone is used with OpenStack Swift because it
provides convenient authentication and authorization. OpenStack Keystone supports
different authorization methods like password based, token based and ec2 based
methods and so on. Also custom authorization methods can be added to the keystone.</p>
      <p>Projects that use OpenStack object storage as a big object storage need to have
automated storage scaling because it’s almost impossible to scale big storages manually.</p>
      <p>The goal of the article is to develop an algorithm which can be used to
automatically scale both OpenStack object storage and OpenStack identity service.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>The problem of scalability</title>
      <sec id="sec-2-1">
        <title>Keystone scalability problem</title>
        <p>
          The most popular and secured method to authorize users in Keystone is the token
based method. The problem is that it doesn't easily scale out of the box. It has several
techniques to generate tokens: UUID, PKIZ, PKI and FERNET [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>UUID method generates a random string in a database and returns it as a token.
When validation is performed it searches the token in the database and compares
them. The main problem is all the tokens need to be stored in the database in order to
be validated. Of course there are several layers of cache which can be enabled but still
the database is the one to work with.</p>
        <p>
          PKI method generates tokens which are signed documents that contain the
authentication context, as well as the service catalog. The Identity service uses public/private
key pairs and certificates in order to create and validate PKI tokens [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The problem
with such type of tokens is a very long length depending on the size of the OpenStack
deployment.
        </p>
        <p>PKIZ tokens are the same as PKI tokens. The only difference is that PKIZ tokens
are compressed to help mitigate the size issue of PKI. PKI and PKIZ tokens are
deprecated and not supported in Ocata release.</p>
        <p>
          FERNET tokens are extremely lightweight and aren’t stored anywhere (same as
with PKI and PKIZ tokens). To generate a fernet token the keystone uses a key which
is stored in a keystone machine. For security reasons it is recommended to update a
key after some usage because if a malefactor finds out the key they will be able to
generate their own tokens and use all services as an administrator. To verify a token
on any machine the same key must be stored on all keystone machines. It would be
hard to update a key on all servers in one moment. That is why the fernet method
provides different types of keys.
─ Primary key is the key which is used to generate and verify tokens. After key
rotation a primary key becomes a secondary key [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
─ Secondary keys are used only to verify tokens. A limit can set for the count of
these keys. After secondary keys reach the limit the oldest key will be deleted [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
─ Staged key is the key which will be used as a next primary key. After key rotation
a staged key becomes a primary key and new staged key is generated [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>Because of these key types it is possible to rotate a key on any keystone machine
and then distribute new keys to all other machines. Still there are automation and
distribution problems. Keystone doesn't have a tool to update and distribute keys on
all machines.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Swift scalability problem</title>
        <p>OpenStack Swift scales to enormous sizes but it has some complicity with the
scaling. To manage all drives it stores all information about devices of the cluster in
special files called rings. Data is distributed around the cluster with a modified consistent
hash ring algorithm. When a device is added, removed and device weight is changed
new ring files must be distributed to all swift nodes. After the node has received new
ring files it starts a replication process.</p>
        <p>Of course it would be hard to distribute new ring files to all nodes and move
replicas to new locations in one moment. That is why a special parameter "min part hours"
was created. It is responsible for two actions. Firstly it is time after which a ring can
be rebalanced again. Secondly when Swift moves replicas it moves only one replica
and other stays locked at the same place until either replica movement is done or "min
part hours" have passed.</p>
        <p>
          Still the problem with Swift scalability is adding, removing or changing a weight
of a device will cause a movement storm and the cluster will suffer for some period of
time. For example when drives are added to the Swift cluster it redistributes its data to
the new drives immediately. For instance if a drive with 4 TB is added to a cluster
which is 50% full then it causes 2 TB movements to the drive. With 10Gb Ethernet
port it would take about 27 minutes at 100% utilization, assuming the source drives
have enough capacity to send the data, the new drive can consume it at that pace, and
the network switches can support that transfer. In practice, degraded performance
from the cluster will take hours [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. To prevent degraded performance of the cluster,
capacity could be added gradually but the problem is that there are no any opened
methods which describe how to do it automatically.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The solution of scalability</title>
      <sec id="sec-3-1">
        <title>Keystone scalability solution</title>
        <p>As Keystone uses an SQL database to store users' information, keys information
could be stored in the same database. It is advisable to have a highly available
database. MariaDB Galera Cluster or Percona Xtradb Cluster or something similar can be
a good choice. They both have master-master architecture.</p>
        <p>It would be hard to manage crond jobs on all nodes that is why a configuration
management software can be used. It is possible to choose any configuration
management software but it is advisable to choose software which scales easily and can be
highly available. SaltStack is a good choice as it supports different backends and can
work in a multi-master mode with a failover option.</p>
        <p>The problem with the key rotation is that it must be rotated only on one node in
one moment. If keys are rotated on different hosts at one time, then different hosts
will have different keys. Several such rotations will result in inconsistent keys on
different hosts. To prevent this situation, global locks should be used across the
cluster to identify which node will rotate keys in particular time. It is possible to use
leader election technique but it consumes a little bit more CPU and bandwidth so global
locks are the best choice in this situation. De facto standard for such tasks is
Zookeeper because it is lightweight and has all the functionality required.</p>
        <p>The final algorithm to update keys will be as follows:
1. SaltStack will send a job to right nodes to update keys if they differ. The job will
be sent for example two times per day.
2. If keys differ from the keys in database then keys will be updated.</p>
        <sec id="sec-3-1-1">
          <title>The final algorithm to rotate keys will be as follows:</title>
          <p>─ last_rotation_date is the latest date of keys rotation.
─ rotation_interval is the interval of keys rotation.
1. SaltStack will check the last rotation date.
2. If the last rotation date is later than rotation interval time ago then SaltStack will
send a job to right nodes to rotate keys if needed.
3. The node will try to acquire a lock in the Zookeeper cluster.
4. If the lock was acquired then keys are rotated.
5. If keys were rotated then they are stored in a keys repository.
6. The lock is released.</p>
          <p>The simplest way to develop 3rd, 4th and 5th cases in the above algorithm is to use
Apache Curator to manage Zookeeper. It has all common algorithms which are used
with Zookeeper. These algorithms allow having Keystone instances which can be
scaled without worrying about consistency.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Swift scalability solution</title>
        <p>To automate adding capacity gradually it is possible to use a global lock again.
Zookeeper is helpful in this case. As same rings have to be around all nodes, ring files
are required to be stored in global storage. For this purpose database can be used but it
would be an additional unnecessary technology and this technology isn’t designed for
such type of tasks. In fact, Swift itself can be used to store its rings. The only thing
left to do before the first rings update is to store empty rings in the OpenStack Swift.
Thus Swift will use itself as storage for its rings. Again, SaltStack will be used for
both ring updates and gradual capacity changes on all servers.</p>
        <p>The above technologies are not a requirement. It's just practical recommendations.
It is possible to develop any of those technologies or use other technologies instead of
them. Also, it's possible to use a distributed file system or synchronization processes
instead of object storage to synchronize rings among the cluster.</p>
        <p>The main algorithm is based on global locks and distributed, highly available
storage. It is important to update rings on a single node in a particular moment of time.
Global locks can guarantee that in a particular moment of time only one node
modifies rings. Distributed storage can guarantee that all nodes will be eventually
consistent.</p>
        <p>The final algorithm to update rings will be as follows:
1. SaltStack will send a job to right nodes to update rings if they differ. The job will
be sent for example once per hour.
2. If rings differ from the rings which are stored then rings will be updated.</p>
        <sec id="sec-3-2-1">
          <title>The final algorithm to change rings will be as follows:</title>
          <p>─ ring_description is the full description of expected ring.
─ current_ring is the current ring description.
─ last_change_date is the latest date of rings modifying.
─ change_interval is the interval of rings modifying.
1. SaltStack will compare the description of the ring with the current ring.
2. If there are any differences between ring description and current ring then
SaltStack will check the last change date.
3. If the last change date is older than change interval time ago then SaltStack will
send a job to right nodes to change rings.
4. The node will try to acquire a lock in the Zookeeper cluster.
5. If the lock was acquired then rings would be updated (for example if the drives
weight shows a capacity in GB then it is possible to add or remove 25 weights in
30 minutes per drive. Notice that it is highly recommended to calculate “min part
hours” and set it to the amount of time which replication takes, in this case 1 hour
is enough).
6. If rings were updated then they would be stored in the OpenStack Swift cluster.
7. The lock is released.</p>
          <p>Again, to manage Zookeeper it is preferable to use Apache Curator because it has
all common algorithms which are used with Zookeeper.</p>
          <p>
            As Apache Curator is a Java library it would be better to use a Java library for
OpenStack Swift management. Unfortunately right now there is no any Java library
(listed on the OpenStack web-site [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]) which works correctly with a broken TCP
connection. After the testing of all Java client libraries for the OpenStack Swift it was
found out that they don’t have any solutions for the broken TCP. Those Java libraries
are waiting till the broken TCP notification is accepted from OS.
          </p>
          <p>The logical solution for such type problems is to use an API which is more
popular. S3 API is much more popular that is why it has a powerful Java client library
called AWS Java SDK For Amazon S3 (aws-java-sdk-s3).</p>
          <p>Before S3 API can be used with OpenStack Swift it is required to install an
additional module called Swift3 which expands Swift with S3 API. After that OpenStack
Swift proxy servers must be configured to manage S3 API. Then it will be possible to
use AWS Java SDK For Amazon S3 directly with the OpenStack Swift cluster.</p>
          <p>These algorithms allow scaling Swift cluster without worrying about both
replication storms and inconsistency.
4
4.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Scalability estimates</title>
      <sec id="sec-4-1">
        <title>OpenStack Keystone scalability estimates</title>
        <p>OpenStack Keystone scales almost linearly (bandwidth, CPU load, read ops).</p>
        <p>OpenStack Keystone keeps users' data in either LDAP or SQL database. That is
why it is hard to scale write operation well. But with fernet tokens write operations
aren’t needed to create a token because our tokens aren't stored anywhere.</p>
        <p>All read operations scales linearly because they will work with either cache or a
local database (the rows will not be locked). In addition keystone instances don't depend
on other keystone instances or any additional services. Thus all verification processes
scales linearly. CPU load scales linearly because more cores are added with more
keystone machines. Bandwidth scales linearly because more physical interfaces are
added with more keystone machines.
OpenStack Swift scales linearly (bandwidth, CPU load, read ops, write ops).</p>
        <p>Read and write operations scale linearly because added machines don't depend on
other machines. CPU load scales linearly because more cores are added with more
keystone machines. Bandwidth scales linearly because more physical interfaces are
added with more keystone machines.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Object storages play an important role in the life of big web projects. Object
storages are widely used to store different unstructured data.</p>
      <p>It is very important to have automatically scalable object storage if a business
grows rapidly because it is very hard to scale big storage manually.</p>
      <p>This paper considered a method to automate one of the most popular open source
object storage OpenStack Swift.</p>
      <p>Object storage requires authorization. OpenStack Keystone is the de facto standard
to have authorization in OpenStack Swift. In multi datacenter clusters there are
virtually always different Keystone servers to distribute authorization load that is why a
method to scale OpenStack Keystone was researched.</p>
      <p>Main tools used were Zookeeper and SaltStack. With SaltStack it is possible to
automate these algorithms in a big cluster deployment. Zookeeper can guarantee that
only one node will provide changes in one time.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Introduction to Object Storage. http://docs.openstack.org/admin-guide/objectstorageintro.html (access
          <source>date: February</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. SwiftStack,
          <article-title>Inc: The OpenStack Object Storage system</article-title>
          ,
          <year>February 2012</year>
          - pp.
          <fpage>6</fpage>
          -
          <lpage>28</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>3. Keystone the OpenStack Identity Service</article-title>
          . http://docs.openstack.org/developer/keystone/ (access date:
          <year>February 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Keystone tokens http://docs.openstack.org/admin-guide/identity-tokens.
          <source>html (access date: February</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Fernet - Frequently Asked Questions. http://docs.openstack.org/admin-guide/
          <article-title>identityfernet-token-faq.html (access date:</article-title>
          <source>February</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Sam</given-names>
            <surname>Merritt: Swift Capacity</surname>
          </string-name>
          <string-name>
            <surname>Management</surname>
          </string-name>
          ,
          <source>April</source>
          <volume>09</volume>
          ,
          <year>2012</year>
          . https://www.swiftstack.com/blog/2012/04/09/swift-capacity
          <article-title>-management/ (access date:</article-title>
          <source>February</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Software</given-names>
            <surname>Development</surname>
          </string-name>
          <article-title>Kits</article-title>
          . https://wiki.openstack.org/wiki/SDKs (access date:
          <year>February 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Markus</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
            <given-names>Kowalski</given-names>
          </string-name>
          , Marcus Nohlberg, Simon Tjoa:
          <article-title>Towards Automating Social Engineering Using Social Networking Sites</article-title>
          ,
          <year>August 2009</year>
          - 8 p.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wei</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Yifei Yuan, Li Zhang:
          <article-title>Scalable Influence Maximization in Social Networks under the Linear Threshold Model</article-title>
          ,
          <year>December 2010</year>
          -
          <volume>10</volume>
          p.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Steffen</surname>
            <given-names>Lohmann</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Dietzold</surname>
          </string-name>
          , Philipp Heim, Norman Heino:
          <article-title>A Web Platform for Social Requirements Engineering</article-title>
          ,
          <year>January 2009</year>
          - 7 p.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>