=Paper= {{Paper |id=Vol-1787/518-524-paper-90 |storemode=property |title=Accounting system for the computing cluster at IHEP |pdfUrl=https://ceur-ws.org/Vol-1787/518-524-paper-90.pdf |volume=Vol-1787 |authors= Victoria Ezhova,Viktor Kotliar }} ==Accounting system for the computing cluster at IHEP== https://ceur-ws.org/Vol-1787/518-524-paper-90.pdf
     Accounting system for the computing cluster at IHEP
                                  V. V. Ezhovaa, V. V. Kotliarb
National Research Centre “Kurchatov Institute” State Research Centre of Russian Federation - Institute for High
                       Energy Physics, RU-142281, Protvino, Moscow region, Russia
                         E-mail: aVictoria.Ezhova@ihep.ru, bViktor.Kotliar@ihep.ru


      Accounting system is a very important part of any computing cluster which allows to understand how and
by whom the core of the cluster - computing resources are used. This task become more and more complicated
when different groups of users are present and it needed somehow to calculate general usage to make a bill. In
this work described the development of such system for IHEP central computing cluster.
      Keywords: accounting system, IHEP database, estimate of using resources, LDAP catalog, submission to
Elasticsearch.


                                                                           © 2016 Victoria V. Ezhova, Viktor V. Kotliar




                                                                                                                518
Introduction
      At first for the website of IHEP there system of accounting has been developed for internal and
external users yet. If you are internal users you have not need to use the certificatethere. You can start
tasks directly. Moreover there are internal experiments at IHEP. It means that internal users can take
part in projects of various virtual organizations. All of the systems of GRID-monitoring can't trace it.
And so the system of accounting has been introdused.
      In this work you can see some completions to already existing system. It is the cost of using the
resources of the computing cluster by the user. As there are no lawful documents regulating the cost of
certain services of a computing cluster the owner itself has the right to establish them. And therefore
the data have been taken from the official page of the large company. Not all data have been used
during calculation as in our case. It becouse of resources of computer center were provided not to
individuals, but internal users of the cluster.
      As the result, the web interface for representation of results in graphic and text types has been
developed. Also the script which takes data from ldap and forms the text file for further work with
base has been created.


Account's information overview
     For GRID systems of accounting a page of the website has been developed for the data connected
with work of internal users. Here such data of the analysis as enter:
     cput - maximum operating time of the processor, or CPU time;
     mem - the volume of physical memory is allocated for data storage of a task of the user on Mb
        by sysytem;
     vmem - the volume of random access memory is spent for performance of tasks of the user on
        Mb;
     walltime - time of performance of a task in hours;
     estimate - result on the basis of the previous values. It shows resources which were spent by
        each user during the work on a cluster. It has monetary value.
     These data have been taken from the distributed database of Institute of High Energy Physics.
The last variable is generated by inquiry in the Database. As result, we receive the list of internal users
with the relevant data about work on the cluster.




                                   Figure 1. Account's information result




                                                                                                     519
      Also on the same page there are the diagram of expenses of computing capacities by users. Here
cost is expressed on percentage of the general expenses. It allows to monitor using of the cluster in the
organization by users.




                                     Figure 2. Using cluster's diagram

     The idea of whole is very visually expressed by circle. The size of each value is represented in
the form of sector of a circle which area corresponds to a contribution of this value to total amount.
Thus, the circle is the total cost of the solution of tasks of internal users.
     The user of system can choose any period of time and look at the parity of using of the cluster.
The calculation of expenses allows to operate activity of IT divisions, having chosen processes of
granting a set of IT services as object of management. It is a direct type of expenses. In the future it
can become a basis of creation of service-oriented model of the calculation of expenses.


Computing
      Cloud computing has been the basis for these calculations. Cloud computing, also known as on-
the-line computing, is a kind of Internet-based computing that provides shared processing resources
and data to computers and other devices on demand. It is a model for enabling ubiquitous, on-demand
access to a shared pool of configurable computing resources. Consumers of cloud computing can
reduce considerably expenses on infrastructure of information technologies.




                                                                                                   520
                                   Figure 3. Start of tasks of two groups of users

     Cost reductions claimed by cloud providers. A public-cloud delivery model converts capital
expenditure to operational expenditure. This purportedly lowers barriers to entry, as infrastructure is
typically provided by a third party and need not be purchased for one-time or infrequent intensive
computing tasks. The service provider automatically estimates the resources was consumed at the
certain level of abstraction (for example, the volume of the stored data, capacity, the number of users,
the number of transactions), and on the basis of these data provider estimates the volume of the
services was provided to consumers.
     From the point of view of the supplier cloud computing allows to save on scales, using smaller
hardware resources, than it would be required at the allocated hardware capacities for each consumer.
     Grid computing it is form of distributed and parallel computing, whereby a 'super and virtual
computer' is composed of a cluster of networked, loosely coupled computers acting in concert to
perform very large tasks.
     With using cloud computing consumers of information technologies can significantly cut capital
expenses for creation of data-processing centers, purchase of the server and network hardware.
     Calculation was formed on the basis of data of the Selectel company. The reference to the source
is specified in the general list of sources. The table of data on use of resources of a cluster is given
below.

                                         Table 1. Table of cost of resources


         Resort                                                                       Price in rubles
         Process time (CPU time)                                                       0,9 for an hour
         Random access memory (RAM)                                     0,58 is over 1 Gb for an hour
         Storage of information                                            9 is over 1 Tb for an hour
         The read/written down volume                                                0,1 is over 1 Gb


In general the ready formula looks as summ of value:
Uvmem*CRAM/hour + Umem*Cread/written + Ucput/hour*CCPU/hour + Uwalltime/hour*Umem*Cstorage inf.1TB/ hour
Where: Uvmem – used virtual memory;
        CRAM/hour - cost of RAM working for an hour;
        Umem – used memory;
        Cread/written - cost of the read/written down volume;




                                                                                                           521
        Ucput/hour - used cput on hour;
        CCPU/hour - cost of working CPU for an hour;
        Uwalltime/hour - used walltime on hour;
        Cstorage inf.1TB/ hour - cost of storage information is over 1TB for an hour.


Data base overview
     The database is designed under using MySQL RDBMS. Perl and C scripts take data from pbs-
logs and put it in the database. The scheme of the Database is given below, it consists of a set of
tables. New entries are added in the database once a day automatically.




                                     Figure 4. Structure of IHEP database

      The table EMPLOYEES contains the list of internal users. The table is formed on the basis of the
data which are contained in the text file. This text file is filled by the data was taken from the LDAP
server by the script.
      The table DEPARTAMENTS contains the list of all departments of institute. These two tables
contain all necessary information about internal users. They are updated once a day. Pbs-logs are
compared with the table Employees. Such data as cput, mem, vmem and walltime enter here.
Information about internal users is brought in IHEP_archive, and about external in GRID_archive. In
the table IHEP resources are divided by departments, and in the table IHEP1 resources are divided by
users. In the table GRID resources are sorted by the virtual organizations, and in GRID1 resources are
sorted in more detail by days.


LDAP
      The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry standard
application protocol for accessing and maintaining distributed directory information services over an
Internet Protocol (IP) network. Every entry in the LDAP catalog consists of one or several attributes
and possesses a unique name. For example in the entry dn: cn=admin,dc=ihep,dc=su "dn" is the
distinguished name of the entry; it is neither an attribute nor a part of the entry. "cn=admin" is the
entry's RDN (Relative Distinguished Name), and "dc=ihep, dc=su" is the DN of the parent entry,
where "dc" denotes 'Domain Component'. The other lines show the attributes in the entry. Attribute
names are typically mnemonic strings, like "cn" for common name, "dc" for domain component,
"mail" for e-mail address, and "sn" for surname. The client can then contact the other server.




                                                                                                 522
     LDAP is widely used standard of access to directory services. The OpenLDAP server is most
known from freely extended open realization. The OpenLDAP installation at IHEP is shown on the
Figure 5. It consist of the two OpenLDAP servers reachable throught a DNS round robin mechanism.
There is a master server where all modifications of the LDAP database are done and a slave server to
which all modifications from the master server are replicated. Modifications on the LDAP database
only allowed to the admin user authorised via kerberos authorization.




                               Figure 5. The OpenLDAP installation at IHEP

      Writing of programs which works with LDAP, becomes very simple with use of Python and
LDAP. The python-ldap provides an object-oriented API to access LDAP directory servers from
Python programs. The working program is written by python 2.8. This script looks for data on the
LDAP server and forms the text file containing names of users with the dn attributes = "ou=users,
dc=ihep, dc=su" and filter = '(objectclass=ihepuser)' attrs = ['uid']. It is possible to present schemes in
the form:




                           Figure 6. Hierarchy of the users in the LDAP system




                                                                                                     523
     As it was told earlier, users of ihep have the objectClass attribute - ihepuser. External users don't
have it. The database is filled on the basis of these data in further.
     Every session some tables of the Database are destroyed and formed again. The table
EMPLOYEES is filled in with data from the LDAP server on the basis of the users.txt file.


Elasticsearch and kibana
       Further the information of accounting from cluster torque pbs will be dumped in ES
(Elasticsearch) for the real time analysis analysis. The script on a python will be written and so, it will
sort the file accounting of torque pbs and throw off this information in ES.
      Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable
full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch
can be used to search all kinds of documents. Display of the data loaded into ES — it is work for
Kibana.
      Kibana is an open source (Apache Licensed), browser based analytics and search dashboard for
Elasticsearch. In other words, Kibana is an open source data visualization plugin for Elasticsearch.
      The script will collect information from pbs-logs in ES, and further it will be displayed in the
form of the schedule from Kibana. It is one of the main tasks within work on creation of system of
uniform monitoring of computer center of IHEP.


Conclusions
      The system of accounting has been developed under Linux several years and it are successfully
used at Institute of High Energy Physics. It helps to collect statistics about resources of the cluster and
it gives the chance to trace who how many resources use. In this case use of resources of the cluster
has monetary value. It is shown on the chart in the form of a share of each internal user. Creation of
the system of the account (accounting) in IT and politicians of mutual settlements with consumers of
IT services (charging) is a complex task which has to be coordinated with the accounting policies
admitted to the companies.


References
Calculation of cost of services of the cloudy server [Electronic resource]:
     https://selectel.ru/services/cloud/#price
The homepage of accounting for a cluster [Electronic resource]:
     http://farmacnt.ihep.su/accounting/index.php
Result of a request. Appeals to a cluster of internal users [Electronic resource]:
     http://farmacnt.ihep.su/accounting/user_result_table.php
PHP Select Data From MySQL [w3schools.com]. [Electronic resource]:
      http://www.w3schools.com/php/php_ mysql_select.asp
LDAP in Python [Electronic resource].(In Russ.): https://www.komtet.ru/lib/plangs/python/ldap-v-
     python
Configuring and securing PYTHON LDAP Applications [Mastering OpenLDAP: Configuring,
     Securing and Integrating Directory Services 2007].[Electronic resource] :
     https://www.packtpub.com/books/content/python-ldap-applications-part-1-installing-and-
     configuring-python-ldap-library-and-bin




                                                                                                     524