Reliability-based Design of Network Structure Systems Using the Monte-Carlo Method * Aleksander Moshnikov [0000-0002-3689-2472] ITMO University, Saint-Petersburg, 197101, Russia moshnikov.alex@gmail.com Abstract. The article is devoted to the choice of reliability-oriented solutions for building an enterprise management system. The Monte Carlo method is used as a tool for reliability analysis, which provides an estimate of the reliability index with a given confidence probability. A software implementation in the R lan- guage is proposed. A quantitative example is provided. Keywords: Monte-Carlo Method, Reliability Assessment, ERP-MRP Systems. 1 Introduction Modern enterprises accumulate a huge amount of information, such as documentation, graphic and video information from access control systems, operation data of techno- logical equipment and machines. The accumulation of large amounts of information creates challenges for store, protect, and provide access. Enterprise Resource Planning (ERP) and Material Requirements Planning (MRP) systems are widely used to solve such problems. They include the appropriate software and the necessary infrastructure. Designing such systems is a complex task that involves finding a compromise between usability, cost, and reliability. Reliability-based design optimization (RBDO) is the approach of distribution of re- liability requirements and selection of architectural solutions that provide a given level of reliability of the system as a whole. It makes it possible to make a tradeoff between an increase in reliability and a cost decrease [1]. Reliability allocation and optimization problem has been widely treated by many authors. Although most of the attention to this issue has been given to the redundancy allocation problem [2-4]. Aspects of computing reliability are given in [12- 13]. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 A. Moshnikov 2 Architecture of the ERP-MRP system 2.1 OSI model In the open system interaction model (OSI), the information on the logical network diagram corresponds to L3-level information. The L3 layer is an abstraction layer that reflects how packets are forwarded through intermediate routers. At the L2 level, data channels between neighboring nodes are represented, while at the L1 level, only their physical location is shown According to this model, the network is divided into three logical levels: the core of the network: high — performance devices, the main purpose is fast transport, the dis- tribution layer: provides security policies, aggregation and routing in VLANs, defines broadcast domains and the access level: usually L2 switches, connecting end devices. 2.2 Typical architectures "Bus". A characteristic feature of the "bus" type topology is the presence of a single data transmission line, to which all subscriber devices are connected, which carry out alternate data exchange. The transmitted data is available to all subscribers connected to the trunk. Data parameters are set in such a way that the addressee (recipient) uniquely identifies them. "Ring". In contrast to the "bus" topology, the structure of the ring topology implies serial connection of subscribers, as a result of which the information flow goes from one device to another in turn. The message parameters contain markers that are used by the receiving device to determine whether it is the recipient. If the response is positive, the message is considered delivered, and if the response is negative, it is transmitted further over the network. "Star". In modern architecture, the most common topology is the "star" type. The connection scheme for the specified topology requires a switching device that provides addressing and distributes information flows between subscribers over sepa- rate communication channels. "Cell". A feature of the "cell" topology is that subscriber devices also perform the role of switching devices. Each subscriber device is connected by four communication channels. The advantage is the high reliability of this structure – each point has at least four communication channels with other subscriber devices. 3 Reliability allocation 3.1 Allocation techniques Reliability allocation is a crucial step for each product development process, it allows to assign level failure rate target to different system units and then to reach the desired reliability goals for the whole system. Reliability-based Design of Network Structure Systems Using the Monte-Carlo Method 3 The optimization task may be to maximize the reliability index under specified limits on the number of available resources, or to minimize resource consumption when the required level of reliability is reached. The distribution of a given reliability R* over the system elements requires solving the following inequality: f(R1 , R 2,…, R n ) ≥ R∗ , (1) where Ri is the specified probability of failure-free operation of the i-th element; f - is the functional relationship between the elements and the system. The allocation procedure is performed through an iterative process. The first step starts from the initial plan, when few data are available concerning components. Various reliability allocation methods have been widely discussed and developed over the last several decades. One existing approach combines one or several criteria in different combination ways for obtaining an allocation weight and allocating reliability in proportion of the weight [4]. For example, the Advisory Group on Reliability of Electronic Equipment (AGREE) method combines complexity into allocation weight [5], and the Aeronautical Radio Inc. (ARINC) method considers failure rate as alloca- tion weight [6]. Another conventional reliability allocation method focuses on multi- objective optimization , including cost minimization [7] and redundancy allocation [8]. 3.2 Network reliability modeling R is a programming language for statistical data processing and graphics, as well as a free open-source software environment for computing under the GNU project. The R language contains tools that allow to create several parallel threads of calculations (due to the simultaneous loading of several processor cores) and several times reduce the time spent on modeling. The graph library is used for statistical modeling of the relia- bility of the automated control system, which implements a large number of algorithms on graphs and allows you to flexibly perform various manipulations with graphs (re- moving a graph vertex, adding a graph vertex, etc.). To search the graph for paths be- tween certain vertices, use the width traversal algorithm (an implementation of this al- gorithm in the iGraph library is used). To generate random numbers with an exponential distribution law, the basic functions of the R language are used [9]. All the functions and algorithm of statistical modeling are written in one script, the modeling process consists in running this script with references to the graph description (in the form of a list of graph edges), system failure conditions (in terms of graph paths), and data on the reliability of system elements (represented on the graph by vertices). The results of the simulation are a description of the system failure scenarios at each iteration of the sim- ulation and the values of random system failure events. 4 A. Moshnikov 4 Numerical example 4.1 Description of ERP-MRP system The local area network ERP-MRP contains the necessary infrastructure for the inter- connection of systems and their individual functional blocks. In General, the network is based on the star topology and consists of main and auxiliary nodes. The system consists of the following units: 1. The Hardware of the main computing resources (servers, central switches) is allo- cated to the data center (DS); 2. Server cabinet (SC) is designed for collecting, processing and storing information about the operation of ERP-MRP system equipment, as well as information interac- tion. 3. Data storage cabinet (DSC) is designed for storing and processing large amounts of information, archive management. 4. Main switching node (MSN), is a part of MSN cabinet; 5. Auxiliary switching node (ASN), is a part of ASN cabinet; 6. A workstation (WS) is a set of office computer equipment and system software and is installed at the workplace of the staff. 7. Connecting Cabinet for adjacent systems (CCAS); 8. Switching node of the building (BSN). It is assumed that DC equipment has been identified and has the following reliability indicators presented in table 1. Table 1. Reliability data of SC and DSC Element model Code Failure Rate, h−1 Server cabinet SC 121∙10-6 Data storage cabinet DSC 19∙10-6 Architecture of ERP-MRP system presented on Fig. 1. 4.2 Simulation parameters Reliability indicators are calculated for a sample with 100 cycles. To ensure that the probability of failure is calculated, a sample with 5143 cycles is used, which provides a level of accuracy greater than 99%. The simulation results of failure probability are presented in Fig. 1. According to the results of the Monte Carlo simulation (fig. 2), it can be argued that the probability of the ERP-MRP systems functioning in 5000 hours will be no less than 0.9945 with a confidence probability of 0.90. Reliability-based Design of Network Structure Systems Using the Monte-Carlo Method 5 Fig. 1. Architecture of ERP-MRP system 6 A. Moshnikov 120 100 80 Histogram of MTBF Frequency 60 40 20 0 0 10000 20000 30000 40000 50000 60000 MTBF,ч Fig. 2. Histogram of the distribution values MTBF data of ERP-MRP system To select the best option for building the system, several iterations of modeling are performed to determine the reliability of the system for various models of purchased components. The composition of the MSN consists of commercially available compo- nents hub. Part CASN1-CASN3 and CCAS include switches. Data on equipment reliability is presented in table 2 Table 2. Initial reliability data Element, model Code Failure Rate, h−1 Cost, c.u. Switch A S1 4∙10-6 10000 Switch B S2 3∙10-6 15000 Switch C S3 2∙10-6 30000 Switch D S4 1∙10-6 90000 Hub A H1 6∙10-6 124000 Hub B H2 2∙10-6 213000 Hub C H3 1∙10-6 196000 Reliability-based Design of Network Structure Systems Using the Monte-Carlo Method 7 4.3 Simulation results Based on the simulation, 3 variants of the system construction were determined that meet the reliability requirement of 0.99. The simulation results are shown in Fig. 3. Reliability 1 0,995 0,99 0,985 0,98 0,975 Hub A Switch A Hub B Switch B Switch C Hub C Switch D Fig. 3. The set of values of the probability of failure of the system corresponding to the quantile 0.80 When considering the 80% probability of failure quantile, the following network hard- ware models can be selected as system elements: [S1;H1], [S2;H1], [S3;H1], [S4;H1], [S1;H2], [S1;H3]. The minimum cost will be provided when selecting [S1;H1] and will be 576,000 c. u. To improve accuracy, methods of reducing the variance of a sample estimate, for example, the Cross-Entropy Monte-Carlo method [10, 11], can be used. If the probability of failure-free operation does not meet the requirements for the system, then to increase the reliability, it is necessary to evaluate the significance of the elements, for example, use the Birnbaum Importance Measure [9]. Increasing the reli- ability of the elements with the biggest significance will allow achieving the required MTBF or failure probability. 5 Conclusion An approach to the choice of reliability-oriented solutions for building an enterprise management system is proposed. As a tool for reliability analysis, the Monte Carlo method is used, which provides an assessment of the reliability index with a given 8 A. Moshnikov confidence probability. Software implementation in the R language was performed. The performance of the software is demonstrated using a quantitative example. Selected equipment configuration provides the specified reliability of the ERP-MRP system with a minimum cost. References 1. Lee, Tae Won, and Byung Man Kwak. "A reliability-based optimal design using advanced first order second moment method." Journal of Structural Mechanics 15, no. 4 (1987): 523- 542. 2. Misra K. B., and Sharma, Usha, “Multicriteria optimization for combined reliability and redundancy allocations in systems employing mixed redundancies,” Microelectronics and Reliability, Vol. 31, No 2, pp. 323-335, 1991. 3. Kuo, W. and Prasad, V.R. (2000). An annotated overview of system reliability optimization. IEEE Transactions on Reliability Engineering. Vol. 49: pp.176–187 4. O. P. Yadav and X. Zhuang, “A practical reliability allocation method considering modified criticality factors,” Reliability Engineering & System Safety, vol. 129, no. 9, pp. 57–65, 2014 5. X. F. Liang, L. Y. Chen, H. Yi, and D. Li, “Integrated allocation of warship reliability and maintainability based on top-level parameters,” Ocean Engineering, vol. 110, no. 12, pp. 195–204, 2015 6. M. Catelani, L. Ciani, G. Patrizi, and M. Venzi, “Reliability allocation procedures in com- plex redundant systems,” IEEE Systems Journal, vol. 12, no. 2, pp. 1182–1192, 2018. 7. A. Heidari, Z. Y. Dong, D. Zhang, P. Siano, and J. Aghaei, “Mixed-integer nonlinear pro- gramming formulation for distribution networks reliability optimization,” IEEE Transac- tions on Industrial Informatics, vol. 14, no. 5, pp. 1952–1961, 2018. 8. K. Khalili-Damghani, A.-R. Abtahi, and M. Tavana, “A new multi-objective particle swarm optimization method for solving reliability redundancy allocation problems,” Reliability En- gineering & System Safety, vol. 111, no. 3, pp. 58–75, 2013. 9. Birnbaum, Z. W., On the Importance of Different Components in a Multicomponent System, Multivariate Analysis - II, Edited by P. R. Krishnaiah, Academic Press, pp. 581–592, 1969. 10. K.-P. Hui, N. Bean, M. Kraetzl, Dirk P. Kroese. The Cross-Entropy Method for Network Reliability Estimation Annals of Operations Research, 2005, Volume 134, Number 1, Page 101 11. K-P. Hui, N. Bean, M. Kraetzl, and D. Kroese. The tree cut and merge algorithm for estima- tion of network reliability. Probability in the Engineering and Information Sciences, 17(1):25-45, 2003. 12. Moshnikov, A.; Bogatyrev, V. Risk Reduction Optimization of Process Systems under Cost Constraint Applying Instrumented Safety Measures. Computers 2020, 9, 50. 13. Bogatyrev V. A., Bogatyrev S. V., Bogatyrev A. V., Model and Interaction Efficiency of Computer Nodes Based on Transfer Reservation at Multipath Routing,2019 Wave Electron- ics and its Application in Information and Telecommunication Systems (WECONF), Saint- Petersburg, Russia, 2019, pp. 1-4. doi: 10.1109/WE-CONF.2019.8840647