=Paper=
{{Paper
|id=Vol-2064/paper27
|storemode=property
|title=
Neural network model for identification virtual network functions in multi-cloud platform and algorithmic solutions to optimize network work in the infrastructure of the virtual data center
|pdfUrl=https://ceur-ws.org/Vol-2064/paper27.pdf
|volume=Vol-2064
|authors=Denis Parfenov,Irina Bolodurina,Yury Ushakov
}}
==
Neural network model for identification virtual network functions in multi-cloud platform and algorithmic solutions to optimize network work in the infrastructure of the virtual data center
==
UDC 004.7 Parfenov D.I., Bolodurina I.P., Ushakov Yu.A. Orenburg state University, Orenburg, Russia NEURAL NETWORK MODEL FOR IDENTIFICATION VIRTUAL NETWORK FUNCTIONS IN MULTI-CLOUD PLATFORM AND ALGORITHMIC SOLUTIONS TO OPTIMIZE NETWORK WORK IN THE INFRASTRUCTURE OF THE VIRTUAL DATA CENTER Abstract The article describes an approach to development a neural network model for identification virtual network functions. Our solutions are based on the analysis the statistical properties of flows circulating in the network of the virtual data center and characteristics that describe the content of packets transmitted through network objects. This enabled us to establish the optimal set of attributes to identify virtual network functions. We developed an algorithm for optimizing the placement of virtual data functions using the data obtained in our research. Our approach uses a hybrid method of visualization using virtual machines and containers, which enables to reduce the infrastructure load and the response time in the network of the virtual data center. The approach applied in our investigation for placement of virtual network functions allows optimizing traffic flows in virtual data center. The algorithmic solution is based on neural networks, which enables to scale it at any number of the network function copies. Keywords Virtual data center; data mining; neural network; multi-cloud platforms, network function virtualization. Парфёнов Д.И., Болодурина И.П., Ушаков Ю.А. Оренбургский государственный университет, г. Оренбург, Россия НЕЙРОСЕТЕВОЕ МОДЕЛИРОВАНИЕ ИДЕНТИФИКАЦИИ ВИРТУАЛЬНЫХ СЕТЕВЫХ ФУНКЦИЙ В МУЛЬТИОБЛАЧНОЙ ПЛАТФОРМЕ* Аннотация В статье описывается подход к разработке модели нейронной сети для идентификации функций виртуальной сети. Наше решение основано на анализе статистических свойств потоков, циркулирующих в сети виртуального центра обработки данных, и характеристик, которые описывают содержимое пакетов, передаваемых через сетевые объекты. Это позволило нам установить оптимальный набор атрибутов для определения функций виртуальной сети. Мы разработали алгоритм оптимизации размещения виртуальных функций данных, используя данные, полученные в наших исследованиях. Наш подход использует гибридный метод визуализации с использованием виртуальных машин и контейнеров, что позволяет снизить нагрузку на инфраструктуру и время отклика в сети виртуального центра обработки данных. Подход, применяемый в нашем исследовании для размещения виртуальных сетевых функций, позволяет оптимизировать потоки трафика в виртуальном центре обработки данных. Алгоритмическое решение основано на нейронных сетях, что позволяет масштабировать его при любом количестве копий сетевых функций. Ключевые слова Виртуальный центр обработки данных; data mining; нейронная сеть; мультиоблачные платформы, виртуализация сетевых функций. * Труды II Международной научной конференции «Конвергентные когнитивно- информационные технологии» (Convergent’2017), Москва, 24-26 ноября, 2017 Proceedings of the II International scientific conference "Convergent cognitive information technologies" (Convergent’2017), Moscow, Russia, November 24-26, 2017 226 Introduction Today, commercial and state organizations, including industrial enterprises (in the fields of electric power, machine building, mining and processing of minerals, etc.) transfer their information infrastructure from physical data centers to virtual data centers. In this case, as a rule, important business applications and services, as well as the data they process, are located on the basis of multi-cloud platforms. This approach is due to the fact that the physical data center infrastructure does not allow to fully providing flexible management of network and computing resources. To ensure the smooth operation of modern business applications in the data center network, it is necessary to maintain the required quality of service (QoS) and the required level of information security in accordance with the end users' tasks. This requires fine-tuning, and in some cases, the allocation of individual network nodes, to effectively solve the tasks of users. In practice, solving problems of this class is not always possible. This is mainly due to the availability of technical, functional and quantitative restrictions imposed by real physical network devices. One of the main problems of building an IT infrastructure based on a physical data center is the heterogeneity of the equipment used to deploy services in a converged network. Leaders of the global network equipment market of Cisco Systems, Huawei, Juniper, HP, and others are developing their own tools for monitoring and managing network devices and protocols. However, existing solutions cannot provide full-fledged management of all network objects, which in turn does not allow flexible and operative adaptation of the data center infrastructure to the current tasks of users. In addition, the high cost of network equipment has a significant impact on the functionality of the data center network in terms of scaling and reserving resources. The lack of hot reservation or scaling of critical nodes can lead to long-term downtime associated with the replacement, failed network objects, or their upgrading to expand functionality. However, the conception of resources virtualization is not quite effective. It allows abstracting the processed and transmitted data flows from physical devices. But, nowadays, the problem of the effective placement of key components of the virtual network environment in a multi-cloud platform is not solved. One of the approaches applied in virtual data centers apart from the virtualization of traditional objects of network infrastructure, is the use of software realization instead of traditional hardware solutions, such as firewall, load balancer, NAT, routers and others [2]. In practice, such solutions are based on the technology of network function virtualization (NFV). The NFV technology provides more flexible deployment and enables to control the virtual objects of a multi cloud platform, which perform the roles of hardware network devices, more effectively [1]. As a rule, the NFV technology is applied together with the software-defined network and enables to exercise adaptive traffic control. However, the technology of network function virtualization has a number of disadvantages. The main problem is the lack of effective methods of planning for placing virtual objects in physical computing nodes. The review of research shows that existing solutions for placing the NFV in the infrastructure of data center use the approaches based on virtual machines or containers [3]. The existing solutions do not deal with resource intensity of each virtual network function and its functional purpose for multi cloud infrastructure of a virtual data center. We have developed the approach that allows us to cluster the existing virtual and physical objects of infrastructure and, then, to place virtual network functions. The main idea of our solution is to estimate the consumption of resources by each element of the network. Besides, we will use the hybrid method of virtualization based on the simultaneous use of virtual machines and containers to create a flexible solution. It will enable to optimize the placement of the technology of network function virtualization in the infrastructure of a virtual data center. Our approach is relevant, since it represents the combination of two modern innovative technologies in the field of the organization of network functioning and virtualization of its components for resource and data flow control in the software-defined networks based on the technology of network function virtualization. The goal of our investigation is to improve the quality of service for applications and services of the multi cloud platforms placed in a virtual data center. Besides, we use the methods of intellectual data analysis to process information about the state and load of key objects as well as the flows between network devices received from the systems of computing nodes monitoring in the software-defined infrastructure of the multi cloud platform. It enables to receive the consolidated assessment of the quality of service and to predict uninterrupted operation and operability of the software-defined infrastructure of a multi cloud platform and the entire virtual data center. Further, we will describe our approaches to the optimization of the placement of virtual network functions in the multi cloud environment of virtual data center. In Section 2, we describe the methods and approaches applied within a framework of our solution. Besides, we describe the main stages of its implementation. The neural network work model, which is a basis for the formation of cards of placement of network functions in the multi cloud environment of virtual data center, is presented in section 3. Section 4 gives the algorithmic solution, which enables to optimize the placement of network functions in practice. The results of experimental investigation in network environment of virtual data center are presented in section 6. Conclusion section includes summary of our investigation as well as future work overview. 227 Methods and approaches Nowadays, neural networks are the most effective and high-speed method for forecasting, parameter identification, clustering and classifications in various fields of knowledge. Today, we see many successful examples of the application of a neural network approach for the creation of intellectual information systems [1, 8, 9]. Besides, the advantage of the neural networks use is the possibility of adaptive self-training with the use of additional methods of approaches. We have used an iterative approach based on a group of methods associated with the optimization of placement of virtual network functions on the objects of the software-defined infrastructure of the virtual data center. First of all we will present all network objects of the multi cloud platform placed in virtual data center as a communication graph. The graph is based on the topology the physical network switching. Each network object is the graph vertex. It can be described by a basic set of parameters, which characterize each element of the network and influence productivity. We have chosen the following characteristics as parameters: volume of memory, volume of disk space, the frequency and quantity of kernels, etc. Further, we will use these characteristics as the input parameters acting as the training set for a neural network during the study of data. A multi cloud platform supports the placement of various applications and services. Therefore, to identify the flows of traffic passing through infrastructure facilities of the data center is an important task for the placement of virtual network functions. In this research, we have used the method based on the analysis of the known network ports for popular applications to obtain this information. This method enables to make an integrated classification of traffic flows; however, since there are non-standard network solutions applied in service-oriented applications, there obtained data will not be enough for the effective control of traffic flows. For a deeper analysis of traffic flows as a data source, we offer to use the method of decoding the protocols of communication based on the analysis of contents of the transferred packages. However, since this approach has rather high resource intensity, it will be used only at a low level of the analysis, for more exact identification of traffic flows of similar applications. The third method uses the sample approach based on the specific signatures located in protocol heading for the identification of the application. The fourth method is based on machine training. This method uses the accumulated data obtained by the above-mentioned methods and applies to them the algorithms of machine training to identify the applications based on characteristic packages and the saved-up statistics of data flows. The advantage of this approach is that algorithms can be trained in real time that will allow reconfiguring software- defined infrastructure of virtual data center on the fly. The proposed solution is based on an integrated approach to the collection of data on the traffic flows circulating in a multi cloud platform. It will allow optimizing the placement of network functions on computing nodes of the virtual data center. To achieve the goals of the research, we have created a neural network system to predict the placement of network function virtualization in the multi cloud environment of the virtual data center. This implies the consecutive implementation of a number of algorithmic and software solutions. First of all, the module of data collection, which enables to receive the sets of primary data about the state of the network infrastructure of the virtual data center, is implemented for a neural network system. The obtained information is necessary for neural network training and testing. The next stage is to use the obtained data to define the optimum scheme for the placement of network function virtualization and to carry out experimental approbation on a-priori known samples and the obtained results. This will allow us to correct the sets of input data and to improve the quality of obtained results at the neural network exit. The final stage is to test the system using the examples, which are not included in the training sample. This will enable to ensure the efficiency of the obtained results. Neural network model of the identification of network functions in the infrastructure of the virtual data centers We have chosen Kokhonen’s network as a neural structure for modeling, since it is the most efficient in the clustering and classification of objects. Another important factor is the visualization of results; it enables to improve the understanding of the structure and character of data at early stages and to specify a neural network model further. Due to the peculiarities of network function virtualization, the support of classification in Kokhonen's network can be used to identify the uniform elements in network for further optimization of their placement. Kokhonen's network is trained by a method of consecutive approximations. Starting from the initial placement of objects selected randomly, the algorithm gradually improves it to supply the data clustering. Another advantage of Kokhonen's network is the opportunity to identify new clusters. The trained network detects clusters in the training data and refers all the data to certain clusters. If the network meets a set of data, which differ from any known samples, it will independently reveal a new cluster of elements then. This feature is very relevant, since it allows entering new network functions into the architecture of virtual data center without the actual change of algorithms of their distribution on physical and virtual computing nodes. The principle of creating a neural network system to optimize the placement of network function virtualization in the multi cloud environment of the virtual data center is as follows. We have selected a number of criteria using 228 the data obtained from the systems of virtual data center monitoring. This enables to both identify the virtual network function and assess its load on computing nodes. Criteria are formulated so that the answer could be always represented in the binary form. i.e. 1 is "Yes" or 0 is "No". The obtained data enable to form the vector of signals E e1 , e2 ,, en , which is placed at the entrance of a neural network. The vector of output values is similar, it has binary components. The neural network is a two-dimensional matrix of neurons of dimension n (the number of inputs of each neuron) per m (the number of neurons). The number of inputs of each neuron is determined with respect to the number of criteria established earlier. The amount of neurons m coincides with the required number of classes and corresponds to the number of the unique network functions used in work of a multi cloud platform. The importance of each of the entrances to neuron is characterized by the numerical size called by weight. It is set in the form of matrix: x11 x12 x1 j x 21 x 22 x1 j X (1) x i1 x i1 x ij With the vectors of weight coefficients of connections x ij x1ij , x2ij ,, xnij as its elements. Kokhonen’s network consists of three layers of neurons. The basis of the network is a covert Kokhonen’s layer. However, in this research, we have offered a changed scheme of output neurons of Kokhonen’s network to obtain the results to identify destination and, simultaneously, to find critical loading on a calculating node (Fig. 1). input layer covert layer output layer ei x ij 1 1 [ 0 , n ]Y1 2 2 I ... 3 K ... [1,0,1]Y2 ... L ... II n Z Fig 1. A neural network model of virtual network function identification We offered to divide the covert layer of Kokhonen’s neural network into two sets. The first set of neurons [1…K] is responsible for the identification of a network function placed in the virtual data center. The work of neural network changes input scales at the exit layer activates the linear function Y1, which takes the value [0…n], where 0 means that the network object under study has no signs of a virtual network function, and values from 1 to n correspond to a particular network function identified by a neural network model. The second set of neurons [L…Z] analyzes the loading of the network object under study and initiates the function Y2 at the exit, which take the values [-1,0,1], where 0 is a normal state, -1 means that the network function is idle or does not perform its functions, and 1 means that the network function is overloaded. The basic criteria used as input data to detect virtual network functions are network records and events, data of the time of packages going through a network object, time of packet input and output, memory loading, the use of CPU, the intensity of dataflow, TTL and others. The data collected in the network are placed at the entrance of a neural network and create a full neural network. That being the case, we should simplify and verbalize a neural network by excluding some elements without the significant reduction of the detection quality. To train a neural network, we have used the data obtained from the system of monitoring of virtual data center of the Orenburg State University. It includes 4 OpenFlow switches (2 x HP 3500yl, 2 x Netgear GSM7200), 8 computing nodes (32Gb RAM, 4 cores), 1 server (32Gb RAM, 8 cores) with OpenFlow controller and 1 server (32Gb RAM, 4 cores) for monitoring function. Routers connected compounds having the speed 1000 Mbit / s, and the computers are connected to a third level router via the second level network connections at 100 Mbit/s. We have chosen the most popular virtual network functions for the experimental study (Router, NAT, Firewall, Proxy, Switch, DPI). 229 Table 1. The result of experimental identification of virtual network functions Virtual network function The number of instances Correct detection vRouter 20 19 (98%) vNAT 15 13 (94%) vFirewall 18 17 (93%) vProxy 25 24 (96%) vSwitch 30 28 (93%) vDPI 16 14 (87,5%) The obtained data enable to conclude that the developed neural system had some difficulties with the detection of some network functions because of their small differences in the parameters chosen. This defect can be eliminated by the introduction of additional criteria to the initial model of neural network. Thus, the application of the developed neural network system enables to identify virtual network functions correctly in middle in 94% of cases, while the resource intensity is insignificant. This increases the efficiency of optimization for placing them in the infrastructure of a multi cloud platform. The algorithm for optimizing the placement of virtual network functions in the virtual data center Our model of identification of virtual network functions allows optimizing their placement in the virtual data center. We will optimize the placement of the network functions found in the virtual data center by using Kokhonen's network by the following criteria: the current load created on computing nodes; resource intensity of a network function; the number of flows going through computing nodes. The main objective of virtual network functions placement is to choose the optimum number of the nodes to implement required functionality as a software solution. Thus, there is a problem of resource planning. Planning is of particular relevance in the organization of dynamic topology in the virtual data center, since the load on computing nodes can change over a wide range at rather short intervals of time and depends on the chosen type of specific network functions placement. To solve the task of optimization, we have developed the algorithm for monitoring the infrastructure of the virtual data center as well as for the placement and launch of network functions. In comparison with the available analogs, the algorithm uses the heuristic analysis of traffic flows and their classification depending on the type of a network function. The general algorithm has the following sequence of steps. Step 1. To identify the arrangement of virtual network functions taking into account the topology of the network infrastructure of the virtual data center. Step 2. To estimate the number of the launched copies of each virtual network function and to range them according to the popularity of the network infrastructure. The popularity is estimated towards traffic flows, which run through the launched copies of a virtual network function. Step 3. To define the load on physical and virtual computing nodes created by each copy of function. Step 4. To compare data and to define virtual network functions, which demand scaling or folding, using the data obtained in steps 1 and 2. Step 5. To reconfigure the topology on the controller of the software-defined network, stop and release the occupied resources of virtual data centers for network functions, which require folding. Step 6. To evaluate a method of placement for virtual network functions, which require scaling and creation of the maximum load on infrastructure. To distribute the most loaded network function using a hybrid way of placement (the containers developed in the virtual machine). To transfer network functions, which are less loaded but require scaling, to the operating mode of a virtual machine. Step 7. To provide the migration of the virtual machines with network functions on the least loaded computing nodes. The approach in this algorithm for controlling the placement of virtual network functions enables to take into account a method of placement and to organize the work of the virtual data center with account for circulating traffic flows and regulate the number of the launched copies of each function. Experimental results The purpose of experimental investigation is to define the efficiency of using the developed algorithm for placing the virtual network functions in infrastructure of the virtual data center. We have created the templates of container placement and the images of virtual machine for the deployment in the software-defined infrastructure of data centers for each virtual network functions. We have created the flows based on statistical information using the generator of requests to imitate the work of network environment in the data centers. To estimate the productivity of virtual network functions, we have used the flows with varying intensity: in the first case, the flows created minimum permissible load. This enabled 230 to estimate the time of a response and delays brought by the infrastructure of the virtual data center (exp. 1). In the second case, we have created the working load on the virtual network functions in the infrastructure of the virtual data centers. This enabled to estimate the time of response for applications in network environment (exp. 2). In the third case, an experimental study evaluated the operation of a virtual network environment of a multi- cloud platform, using the developed algorithm for optimizing the location of virtual network functions (exp. 3). In this case, we have established the consumption of resources for each of the launched copies of virtual network function. This enabled us to predict the required resources in network environment. The results of the experiment are provided in Fig. 2. Response time of request in network Основной Основной Основной Основной (ms) Основной Основной Number of network objects exp 1 exp 2 exp 3 Fig. 2 Diagram of dependence of response time in the multi cloud platform from the quantities of virtual network objects in the data center The research has shown that the static placement of containers on physical nods is inefficient, since it does not enable to redistribute loading quickly. Besides, the transfer of a container to another computing nod leads to the loss of the current connections. The placement of virtual network functions on the basis of virtual machines was more efficient due to flexible load balancing. However, the load on computing nods has considerably increased due to additional overheads for the use of virtual machines. In our research, the most efficient placement was the placement with the use of containers in virtual machines. It has enabled to increase the density of the placement of applications and services in software-defined data centers. Besides, it has enabled to place the containers and applications of services and data in direct network proximity from each other and to reduce the time of response to user requests from applications and, thus, to increase the overall performance of a system. Conclusion Our research enabled to create a neural network model of virtual network function identification. The proposed model of classification based on the statistical properties of the flow defined a systematic approach to the selection of the optimal set of attributes of the traffic flow. The developed algorithmic solutions, which are based on hybrid methods of virtualization and use the data of a neural network model, enabled to optimize the placement of network functions in software-defined segments of the data center network. The results show that the optimization of the number of NVF enables to improve the quality of service by 20-25% by reducing the response time and load on physical network devices. It became possible due to the identification of applications at the initial stage by using the data of the NVF placement in the network of the virtual data center. Further, we are going to assess our approach with a larger number of NVF, since it will allow us to assess the accuracy of our solution. Acknowledgment The research work was funded by Russian Foundation for Basic Research, according to the research projects No. 16-37-60086 mol_а_dk, 16-07-01004, 17-47-560046 and the President of the Russian Federation within the grant for state support of young Russian scientists (MK-1624.2017.9). Литература 1. Bolodurina I. P., Parfenov D. I. A model of cloud application assignments in software-defined storages // Journal of Physics: ConferenceSeries — Vol. 803. 2. Bolodurina I. P., Parfenov D. I. Development and Research of Modelsof Organization Distributed Cloud Computing Based on the Softwaredefined Infrastructure // Procedia Computer Science — Vol. 103 —P. 569–576. 3. Yong Li, Min Chen Software-Defined Network Function Virtualization: A Survey. — IEEE, 2015. 4. Ruozhou Yu, Guoliang Xue, Vishnu Teja Kilari, Xiang ZhangNetworkfunction virtualization in the multi-tenant cloud. — IEEE, 2015. 231 5. Aaron Gember-Jacobson, Raajay Viswanathan, Chaithan Prakash, Robert Grandl, Junaid Khalid, Sourav Das, Aditya AkellaOpenNF:Enabling Innovation in Network Function Control. — IEEE, 2014. 6. Ying-Dar Lin, Po-Ching Lin, Chih-Hung Yeh, Yao-Chun Wang, Yuan- Cheng LaiAn extended SDN architecture for network function virtualization with a case study on intrusion prevention. — IEEE, 2015. 7. Hassan Jameel Asghar, Luca Melis, Cyril Soldani, Emiliano De Cristofaro, Mohamed Ali Kaafar, Laurent MathySplitBox: Toward EfficientPrivate Network Function Virtualization. — IEEE, 2016. 8. Parfenov D., Bolodurina I. “Development and research of models of organization storages based on the software-defined infrastructure” 39th International Conference on Telecommunication and signal processing: materials of conference 27-29 June 2016, Vienna, Austria. 2016. – P. 1-6. References 1. Bolodurina I. P., Parfenov D. I. A model of cloud application assignments in software-defined storages // Journal of Physics: ConferenceSeries — Vol. 803. 2. Bolodurina I. P., Parfenov D. I. Development and Research of Modelsof Organization Distributed Cloud Computing Based on the Softwaredefined Infrastructure // Procedia Computer Science — Vol. 103 —P. 569–576. 3. Yong Li, Min Chen Software-Defined Network Function Virtualization: A Survey. — IEEE, 2015. 4. Ruozhou Yu, Guoliang Xue, Vishnu Teja Kilari, Xiang ZhangNetworkfunction virtualization in the multi-tenant cloud. — IEEE, 2015. 5. Aaron Gember-Jacobson, Raajay Viswanathan, Chaithan Prakash, Robert Grandl, Junaid Khalid, Sourav Das, Aditya AkellaOpenNF: Enabling Innovation in Network Function Control. — IEEE, 2014. 6. Ying-Dar Lin, Po-Ching Lin, Chih-Hung Yeh, Yao-Chun Wang, Yuan- Cheng LaiAn extended SDN architecture for network function virtualization with a case study on intrusion prevention. — IEEE, 2015. 7. Hassan Jameel Asghar, Luca Melis, Cyril Soldani, Emiliano De Cristofaro, Mohamed Ali Kaafar, Laurent MathySplitBox: Toward EfficientPrivate Network Function Virtualization. — IEEE, 2016. 8. Parfenov D., Bolodurina I. “Development and research of models of organization storages based on the software-defined infrastructure” 39th International Conference on Telecommunication and signal processing: materials of conference 27-29 June 2016, Vienna, Austria. 2016. – P. 1-6. Note on the authors: Parfenov Denis I., Candidate of Engineering Sciences, head of department software and technical support of distance learning, Orenburg State University, parfenovdi@mail.ru Bolodueina Irina P., Doctor of Engineering Sciences, Full Professor, head of department applied mathematics, Orenburg State University, prmat@mail.osu.ru Ushakov Yury A., Candidate of Engineering Sciences, assistant professor, head of sector of the information technology, Orenburg State University, unpk@mail.ru Об авторах: Парфёнов Денис Игоревич, кандидат технических наук, начальник отдела программно-технической поддержки дистанционного обучения, Оренбургский государственный университет, parfenovdi@mail.ru Болодурина Ирина Павловна, доктор технических наук, профессор, заведующии̮ кафедрои̮ прикладнои̮ математики, Оренбургский государственный университет, prmat@mail.osu.ru Ушаков Юрий Александрович, кандидат технических наук, доцент, заведующии̮ сектором информационных технологиий Центра информационных технологиий, Оренбургский государственный университет, unpk@mail.ru 232