Accounting system for the computing cluster at IHEP V. V. Ezhovaa, V. V. Kotliarb National Research Centre “Kurchatov Institute” State Research Centre of Russian Federation - Institute for High Energy Physics, RU-142281, Protvino, Moscow region, Russia E-mail: aVictoria.Ezhova@ihep.ru, bViktor.Kotliar@ihep.ru Accounting system is a very important part of any computing cluster which allows to understand how and by whom the core of the cluster - computing resources are used. This task become more and more complicated when different groups of users are present and it needed somehow to calculate general usage to make a bill. In this work described the development of such system for IHEP central computing cluster. Keywords: accounting system, IHEP database, estimate of using resources, LDAP catalog, submission to Elasticsearch. © 2016 Victoria V. Ezhova, Viktor V. Kotliar 518 Introduction At first for the website of IHEP there system of accounting has been developed for internal and external users yet. If you are internal users you have not need to use the certificatethere. You can start tasks directly. Moreover there are internal experiments at IHEP. It means that internal users can take part in projects of various virtual organizations. All of the systems of GRID-monitoring can't trace it. And so the system of accounting has been introdused. In this work you can see some completions to already existing system. It is the cost of using the resources of the computing cluster by the user. As there are no lawful documents regulating the cost of certain services of a computing cluster the owner itself has the right to establish them. And therefore the data have been taken from the official page of the large company. Not all data have been used during calculation as in our case. It becouse of resources of computer center were provided not to individuals, but internal users of the cluster. As the result, the web interface for representation of results in graphic and text types has been developed. Also the script which takes data from ldap and forms the text file for further work with base has been created. Account's information overview For GRID systems of accounting a page of the website has been developed for the data connected with work of internal users. Here such data of the analysis as enter:  cput - maximum operating time of the processor, or CPU time;  mem - the volume of physical memory is allocated for data storage of a task of the user on Mb by sysytem;  vmem - the volume of random access memory is spent for performance of tasks of the user on Mb;  walltime - time of performance of a task in hours;  estimate - result on the basis of the previous values. It shows resources which were spent by each user during the work on a cluster. It has monetary value. These data have been taken from the distributed database of Institute of High Energy Physics. The last variable is generated by inquiry in the Database. As result, we receive the list of internal users with the relevant data about work on the cluster. Figure 1. Account's information result 519 Also on the same page there are the diagram of expenses of computing capacities by users. Here cost is expressed on percentage of the general expenses. It allows to monitor using of the cluster in the organization by users. Figure 2. Using cluster's diagram The idea of whole is very visually expressed by circle. The size of each value is represented in the form of sector of a circle which area corresponds to a contribution of this value to total amount. Thus, the circle is the total cost of the solution of tasks of internal users. The user of system can choose any period of time and look at the parity of using of the cluster. The calculation of expenses allows to operate activity of IT divisions, having chosen processes of granting a set of IT services as object of management. It is a direct type of expenses. In the future it can become a basis of creation of service-oriented model of the calculation of expenses. Computing Cloud computing has been the basis for these calculations. Cloud computing, also known as on- the-line computing, is a kind of Internet-based computing that provides shared processing resources and data to computers and other devices on demand. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources. Consumers of cloud computing can reduce considerably expenses on infrastructure of information technologies. 520 Figure 3. Start of tasks of two groups of users Cost reductions claimed by cloud providers. A public-cloud delivery model converts capital expenditure to operational expenditure. This purportedly lowers barriers to entry, as infrastructure is typically provided by a third party and need not be purchased for one-time or infrequent intensive computing tasks. The service provider automatically estimates the resources was consumed at the certain level of abstraction (for example, the volume of the stored data, capacity, the number of users, the number of transactions), and on the basis of these data provider estimates the volume of the services was provided to consumers. From the point of view of the supplier cloud computing allows to save on scales, using smaller hardware resources, than it would be required at the allocated hardware capacities for each consumer. Grid computing it is form of distributed and parallel computing, whereby a 'super and virtual computer' is composed of a cluster of networked, loosely coupled computers acting in concert to perform very large tasks. With using cloud computing consumers of information technologies can significantly cut capital expenses for creation of data-processing centers, purchase of the server and network hardware. Calculation was formed on the basis of data of the Selectel company. The reference to the source is specified in the general list of sources. The table of data on use of resources of a cluster is given below. Table 1. Table of cost of resources Resort Price in rubles Process time (CPU time) 0,9 for an hour Random access memory (RAM) 0,58 is over 1 Gb for an hour Storage of information 9 is over 1 Tb for an hour The read/written down volume 0,1 is over 1 Gb In general the ready formula looks as summ of value: Uvmem*CRAM/hour + Umem*Cread/written + Ucput/hour*CCPU/hour + Uwalltime/hour*Umem*Cstorage inf.1TB/ hour Where: Uvmem – used virtual memory; CRAM/hour - cost of RAM working for an hour; Umem – used memory; Cread/written - cost of the read/written down volume; 521 Ucput/hour - used cput on hour; CCPU/hour - cost of working CPU for an hour; Uwalltime/hour - used walltime on hour; Cstorage inf.1TB/ hour - cost of storage information is over 1TB for an hour. Data base overview The database is designed under using MySQL RDBMS. Perl and C scripts take data from pbs- logs and put it in the database. The scheme of the Database is given below, it consists of a set of tables. New entries are added in the database once a day automatically. Figure 4. Structure of IHEP database The table EMPLOYEES contains the list of internal users. The table is formed on the basis of the data which are contained in the text file. This text file is filled by the data was taken from the LDAP server by the script. The table DEPARTAMENTS contains the list of all departments of institute. These two tables contain all necessary information about internal users. They are updated once a day. Pbs-logs are compared with the table Employees. Such data as cput, mem, vmem and walltime enter here. Information about internal users is brought in IHEP_archive, and about external in GRID_archive. In the table IHEP resources are divided by departments, and in the table IHEP1 resources are divided by users. In the table GRID resources are sorted by the virtual organizations, and in GRID1 resources are sorted in more detail by days. LDAP The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Every entry in the LDAP catalog consists of one or several attributes and possesses a unique name. For example in the entry dn: cn=admin,dc=ihep,dc=su "dn" is the distinguished name of the entry; it is neither an attribute nor a part of the entry. "cn=admin" is the entry's RDN (Relative Distinguished Name), and "dc=ihep, dc=su" is the DN of the parent entry, where "dc" denotes 'Domain Component'. The other lines show the attributes in the entry. Attribute names are typically mnemonic strings, like "cn" for common name, "dc" for domain component, "mail" for e-mail address, and "sn" for surname. The client can then contact the other server. 522 LDAP is widely used standard of access to directory services. The OpenLDAP server is most known from freely extended open realization. The OpenLDAP installation at IHEP is shown on the Figure 5. It consist of the two OpenLDAP servers reachable throught a DNS round robin mechanism. There is a master server where all modifications of the LDAP database are done and a slave server to which all modifications from the master server are replicated. Modifications on the LDAP database only allowed to the admin user authorised via kerberos authorization. Figure 5. The OpenLDAP installation at IHEP Writing of programs which works with LDAP, becomes very simple with use of Python and LDAP. The python-ldap provides an object-oriented API to access LDAP directory servers from Python programs. The working program is written by python 2.8. This script looks for data on the LDAP server and forms the text file containing names of users with the dn attributes = "ou=users, dc=ihep, dc=su" and filter = '(objectclass=ihepuser)' attrs = ['uid']. It is possible to present schemes in the form: Figure 6. Hierarchy of the users in the LDAP system 523 As it was told earlier, users of ihep have the objectClass attribute - ihepuser. External users don't have it. The database is filled on the basis of these data in further. Every session some tables of the Database are destroyed and formed again. The table EMPLOYEES is filled in with data from the LDAP server on the basis of the users.txt file. Elasticsearch and kibana Further the information of accounting from cluster torque pbs will be dumped in ES (Elasticsearch) for the real time analysis analysis. The script on a python will be written and so, it will sort the file accounting of torque pbs and throw off this information in ES. Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch can be used to search all kinds of documents. Display of the data loaded into ES — it is work for Kibana. Kibana is an open source (Apache Licensed), browser based analytics and search dashboard for Elasticsearch. In other words, Kibana is an open source data visualization plugin for Elasticsearch. The script will collect information from pbs-logs in ES, and further it will be displayed in the form of the schedule from Kibana. It is one of the main tasks within work on creation of system of uniform monitoring of computer center of IHEP. Conclusions The system of accounting has been developed under Linux several years and it are successfully used at Institute of High Energy Physics. It helps to collect statistics about resources of the cluster and it gives the chance to trace who how many resources use. In this case use of resources of the cluster has monetary value. It is shown on the chart in the form of a share of each internal user. Creation of the system of the account (accounting) in IT and politicians of mutual settlements with consumers of IT services (charging) is a complex task which has to be coordinated with the accounting policies admitted to the companies. References Calculation of cost of services of the cloudy server [Electronic resource]: https://selectel.ru/services/cloud/#price The homepage of accounting for a cluster [Electronic resource]: http://farmacnt.ihep.su/accounting/index.php Result of a request. Appeals to a cluster of internal users [Electronic resource]: http://farmacnt.ihep.su/accounting/user_result_table.php PHP Select Data From MySQL [w3schools.com]. [Electronic resource]: http://www.w3schools.com/php/php_ mysql_select.asp LDAP in Python [Electronic resource].(In Russ.): https://www.komtet.ru/lib/plangs/python/ldap-v- python Configuring and securing PYTHON LDAP Applications [Mastering OpenLDAP: Configuring, Securing and Integrating Directory Services 2007].[Electronic resource] : https://www.packtpub.com/books/content/python-ldap-applications-part-1-installing-and- configuring-python-ldap-library-and-bin 524