=Paper=
{{Paper
|id=Vol-3039/short5
|storemode=property
|title=Automated System for Determining a Damage Class for Sections of a Wastewater Disposal Network
|pdfUrl=https://ceur-ws.org/Vol-3039/short5.pdf
|volume=Vol-3039
|authors=Olena Shapovalova,Olga Starkova,Ganna Solodovnyk
|dblpUrl=https://dblp.org/rec/conf/ittap/ShapovalovaSS21
}}
==Automated System for Determining a Damage Class for Sections of a Wastewater Disposal Network==
Automated System for Determining a Damage Class for Sections of a Wastewater Disposal Network Olena Shapovalovaa, Olga Starkovaa and Ganna Solodovnyka a Kharkiv National University of Civil Engineering and Architecture, Sumska street, 40, Kharkiv, 61002, Ukraine Abstract The purpose of the study is to provide rationale and describe the process of designing an automated system for determining a damage class for sections of a wastewater disposal network using one of the up-to-date algorithms of data mining. The current stage of development of society requires the introduction of mathematical methods and mining models and the latest developments in the field of IT technology, even in such a well-established industry as public utilities. The proposed automated system provides the user with the opportunity to distribute sections of the network by clusters with the determination of their centers with the subsequent prioritization of repair work. The paper proposes a new approach to determining a damage class for sections of a wastewater disposal network using clustering algorithms with the option of choosing one of two metrics (Manhattan distance and Euclidean metric). The software component implemented in the Java programming language with the use and within the operation of the system allows determining by a number of section’s criteria their affiliation with a certain class with the option of further ranking and prioritization of renovation work. The software is a web application which, on condition of good Internet connection, can be useful for professionals engaged in maintaining trouble-free operation of wastewater disposal networks at any time and at any location. Keywords 1 Damage class, wastewater disposal network sections, clustering, web application 1. Introduction Among the challenges faced by public utilities in cities and localities across Ukrainian, it is the issues of ensuring the smooth operation of wastewater disposal networks that head the list [1-3]. Sewer canals, the surfaces of which are constantly exposed to aggressive substances, over time suffer from typical damage, which results in the failure of certain sections and the occurrence of an emergency situation. A timely diagnostics of the most vulnerable sections and the network as a whole, followed by preventive measures to maintain the reliable operation of the system may significantly reduce the cost of repair in the event of an accident and ensure the provision of quality services to consumers [4-6]. Automating the processing of survey data about the condition of canals and ranking sections by the nature and extent of their damage is a relevant task at the moment, which will minimize the associated labor costs. Involving the latest developments in mathematical modeling, data mining and IT technology in solving practical problems, in particular in the construction, renovation and maintenance of complex systems [7-8] is a common practice among modern researchers trying to find optimum solutions and reduce costs through implementation of high technology [8-10]. The authors of a number of papers [5-10] suggest different approaches to deal with the issue. For instance, the paper [5] considers the issues of the strategy of selection of potential objects for rehabilitation of water supply networks based on the analysis of the results of technical diagnostics of the condition of the sections according to a number of appropriate criteria. In this case, the criteria are the dynamics of changes in the reliability of particular sections, the presence of external destabilizing _____________________________________ ITTAP’2021: 1nd International Workshop on Information Technologies: Theoretical and Applied Problems, November 16–18, 2021, Ternopil, Ukraine EMAIL: shapovalova.olena@kstuca.kharkov.ua (A. 1); starkova@kstuca.kharkov.ua (A. 2); solodovnik@kn-it.info (A. 3) ORCID: 0000-0003-4566-6634 (A. 1); 0000-0002-9034-8830 (A. 2); 0000-0001-6323-5083 (A. 3) 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) factors, age, technical condition and repairability of pipelines, likely residual useful life period, preliminary costs of rehabilitation, the ratio of the actual cost of pipelines to the required costs in case of their rehabilitation, restrictions on financial costs. The method and information system proposed in the paper [6] for determining the classification of the sewer network sections by condition allows determining the condition class of a sewer section, rating points, and lists of both individual sections and entire sewers that need priority rehabilitation. Unfortunately though, the effective use of this method requires special training and a high level of mastery of special software tools, which often does not allow fully applying it in the practice of planning repair and rehabilitation work. The information and technical support offered by the authors allows storing the results of the technical diagnostics of the condition of sections in the data bank and, retrieving, if necessary, the necessary data on request to evaluate the listed criteria and determine the area of the urban water supply network or the area with the highest accident rate in pipelines (according to the diameters, materials and service life selected for analysis) [5, 6]. Attempts made by domestic researchers to assess the condition and elaborate a priority list for rehabilitation of wastewater disposal network facilities have been quite successful [7-10]. The information system proposed by the authors allows automating the process of elaborating a list of priority facilities of a wastewater disposal network to be rehabilitated, and visualizing, based on the data about the condition of the sewer network, the network section which the user is interested in with all its appropriate parameters [9]. 2. Materials and methods of research The reviewed studies do not give the unambiguous and general description which would allow determining the classification of the sewer network sections, being guided by their technical condition. Therefore, it is of relevance to consider the existing methods for determining the classification of sewers by condition. Moreover, at the moment there are no automated programs for determining the damage class using cluster analysis [11-13]. The relevance of the study and the development of a software application based on the study’s findings are confirmed by the permanent interest of researchers in the issue [1-10] and involve the need for creating an automated system for determining the damage class of the wastewater disposal network sections. This application will be useful to specialists in the area of maintenance of trouble- free operation of the wastewater disposal network and will provide an opportunity to best develop repair plans based on the analysis of large amounts of information and classification of sections according to the level of their damage. The study aims at providing rationale and describing the process of designing an automated system for determining a damage class for sections of a wastewater disposal network using a clustering algorithm [14-16]. The automated system is meant to distribute the wastewater disposal network sections by clusters (with the determination of their centers) with the subsequent prioritization of repair work [16-17]. The object of research is a software application for the implementation of clustering algorithms to differentiate sections of the wastewater disposal network according to the extent of their damage. The subject of research is the technology for implementing algorithms of analysis for the wastewater disposal network sections with the use of the IntelliJ Idea integrated development environment tools and Java programming language. 3. Research findings According to the results of the analysis of the subject area, the input data for the automated system for determining the damage class of a wastewater disposal network section included the name of the section and parameters of cross section reduction, cracks, deformation, pipe ruptures, reinforced concrete corrosion, and the output data included the affiliation of the section with a particular cluster according to the appropriate parameters of its center. The user is given the opportunity to initiate, according to the input data, the calculation process, review and download the data to a file. In the course of loading the input data, the input information is prepared in a certain way and loaded into the system for further analysis and identification of consistent patterns. The initiated calculation process activates the data clustering process, during which the user selects the metrics, the number of clusters and iterations and receives the distribution of sections by clusters [18-20]. The calculation results in the form of a file with information about each section with the indication of the damage class, and the coordinates of the centroids of clusters are stored and can be reviewed at the user’s request and used in the future to make decisions on prioritization of repair work[20-23]. An activity diagram shows the sequence of actions of the automated system and the division of sections into clusters (Fig. 1). Figure 1: Activity diagram In the course of operation, the system performs the algorithmic and mathematical calculations as follows: Forming a file from the output data; Cluster distribution of sections selected for review using the k-means clustering method [24-25]; Searching for cluster centroids using Euclidean metric [26] and Manhattan metric [27]. To perform data clustering using the k-means clustering method, the number of clusters and the number of iterations are set automatically. Each object is represented by a sequence of standardized attributes, i.e. the values of all parameters (attributes) are reduced to the interval [0,100]. In the course of operation the algorithm of the object is given the cluster to which centroid it appears to be the nearest. The described operations are performed until the number of operations exceeds the maximum allowable number. An additional prerequisite to exiting the loop is to reach a state when objects are no longer assigned to new clusters. At the output of the algorithm, the user is provided with a set of objects divided into clusters, the centroids of which are equidistant from each other. The diagram of the classes is shown in Fig. 2 Figure 2: The diagram of the classes, part 1 Figure3: The diagram of the classes, part 2 The classes of the web application are placed in 5 packages, which are responsible for calculations, configuration, display and application layers, such as Model, Service and Controller (Fig. 4). Figure4: Batch structure of the application The following tools were used to create the web application: Apache Maven framework, designed to automate the collection of projects based on the description of their structure in files in the POM (Project Object Model) language;. Tomcat open source servlet container, which allows running web applications and contains a number of self-configuring programs; Java TestNG programming language testing system; Library for modular testing of software in Java JUnit and its extension DBUnit; A set of Bootstrap tools for creating websites and web applications; Spring Boot Java-based open source environment. To start working with the web application, the user should go to the main page of the website (Fig.5). Determining a damage class for sections of wastewater a disposal network K-means clustering method Eucledean metric Manhattan metric Clustering algorithm K-means Eucledean Select data Manhattan Select file File is not selected data Number of clusters Number of iterations Centroid calculation Calculate Figure 5: The window of the main page of the website In this form, the user can choose the number of clusters, the number of iterations, calculate or not the centroids, apply the Euclidean or Manhattan metrics. Also, a .txt file with calculation data must be uploaded. The uploaded file must contain in the lines the numerical data of the section in terms of 5 parameters (cross section reduction, crack, deformation, pipe rupture, and reinforced concrete corrosion) and its text name, separated by commas. If any parameter is missing, it is acceptable to replace it with the number 0. Clicking the “Calculate” button after uploading the input data will get the result. For ease of further use and analysis, the data can be presented in tabular form using the Excel program built-in tools (Fig. 6). 4. Discussion of the findings The developed automated system is used to determine the damage class for sections of a wastewater disposal network. The system is based on the main criteria, principles and methodological tools for classifying potential objects for renovation, which are covered in the research papers [2,3] and cluster analysis algorithms, the area of application of which is wide enough and discussed in the papers [11-14]. Figure 6: The distribution of sections by clusters in tabular form 1. Conclusions According to the findings of the study, using the example of well-known automated systems for determining the damage class for the wastewater disposal network sections, approaches to dealing with the issue were considered, the subject environment was analyzed and an activity diagram was plotted, which was the basis for designing an automated system. Based on the analysis of the subject area, the input and output data for the system were selected; a cluster analysis was chosen as the method for determining the priority of repair work. A system has been designed and implemented, the software component of which is a web- application with client and server parts that automate the determination of the damage class for the wastewater disposal network sections. The user is given the opportunity to store the results of calculations in the database and perform their further processing using spreadsheets. The system was tested and showed high speed of statistical data processing and 100% accuracy of distribution of sections by clusters on the test sample. In the future, it is planned to use the designed system for the analysis of sections of a wastewater disposal network both in real time and for drawing up plans for renovation work. 2. References [1] Honcharenko D.F., Bondarenko A.I., Bulgakov V.V. and Garmash A.A. On the issue of ensuring the maintainability of sewer tunnels in Kharkov. Naukovyi visnyk budivnytstva, 2016. Issue 2, pp. 144–148. [2] Starkova O.V. Models of a reasonable choice of the method of repair and restoration of a section of the sewer network. Naukovyi visnyk budivnytstva. 2016. Issue 3, pp. 80-85. [3] Starkova O.V., Shapovalova Ye.A., Gnuchikh L.A. Modeling the choice of the method of restoration of sewerage networks. Komunalne hospodarstvo mist. 2008. Issue 85, pp. 19 –26. [4] Bulgakov Yu.V. Research of the process of destruction of the construction of the sewer tunnel collector. Naukovyi visnyk budivnytstva, 2015. Issue 5, pp. 79-84. [5] Orlov V.A., Kharkin V.A. Development of a strategy for the restoration of wastewater disposal networks. Stroitelstvo i Arkhitektura, 2001, pp. 15-32. [6] Korynko Y.V., Starkova O.V., and Shevchenko A.A. Methodological foundations of computer modeling of wastewater disposal systems. Naukovyi visnyk budivnytstva, 2003. Issue 23, pp. 223–229. [7] Honcharenko D.F., Starkova O.V. and Aleynykova A.Y. Development of an automated system for selecting a method for restoring water pipelines using a fuzzy logic. Systemy obrobky informatsiyi, 2014. Issue 8 (124), pp. 18-23. [8] Solodovnyk H.V., Deyneha A.A. Making multi-stage decisions using information technology. Naukovyi visnyk budivnytstva, 2019. Issue 2, pp. 165-166. [9] Starkova O.V., Shapovalova E.A., Hnuchykh L.A. and Bondarenko D.A. Development of an automated information system for determining priority objects for the renovation of water supply pipelines. Komunalne hospodarstvo mist, 2011. Issue 99, pp. 312-316. [10] Honcharenko D., Shumakov I., Starkova O., Aleinikova A. and Mikautadze R.. Methodological and computer-based support for choosing underground utility networks renovation method // MATEC Web of Conferences 230, 02010, 2018. [11] Quackenbush J. Computational analysis of microarray data. Nature reviews genetics, 2001, vol. 2, no. 6, pp. 418–427. [12] Sugar C. A., James G. M. Finding the number of clusters in a dataset. Journal of the American Statistical Association, 2003, vol. 98, no. 463, pp. 750–763. [13] Tibshirani R., Walther G. Cluster validation by prediction strength. Journal of Computational and Graphical Statistics, 2005, vol. 14, no. 3, pp. 511–528. [14] Tibshirani R., Walther G., Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 2001, vol. 63, no. 2, pp. 411–423. [15] Cuevas A., Febrero M., Fraiman R. Cluster analysis: a further approach based on density estimation. Computational Statistics & Data Analysis, 2001, vol. 36, no. 4, pp. 441–459. [16] Stuetzle W. Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. Journal of classification, 2003, vol. 20, no. 1, pp. 25–47. [17] Pelleg D., Moore A. W. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. ICML, 2000, pp. 727–734. [18] Volkovich Z., Brazly Z., Toledano-Kitai D., Avros R. The Hotelling’s metric as a cluster stability measure. Computer modelling and new technologies, 2010, vol. 14, pp. 65–72. [19] Barzily Z., Volkovich Z., Akteke-Ozturk B. On a minimal spanning tree approach in the cluster validation problem. Informatica, Lith. Acad. Sci., 2009, vol. 20, no. 2, pp. 187–202. [20] Hamerly Y. F. G. PG-means: learning the number of clusters in data. Advances in neural information processing systems, 2007, vol. 19, pp. 393–400. [21] Dudoit S., Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome biology, 2002, vol. 3, no. 7, research0036. Available at: http://www.genomebiology.com/ (assessed: 17.08.2021). [22] Lange T., Roth V., Braun M. L., Buhmann J. M. Stability-based validation of clustering solutions. Neural computation, 2004, vol. 16, no. 6, pp. 1299–1323. [23] Ben-Hur A., Elisseeff A., Guyon I. A stability based method for discovering structure in clustered data. Pacific symposium on biocomputing, 2002, vol. 7, no. 6, pp. 6–17. [24] Adam Coates and Andrew Y. Ng. Learning Feature Representations with K-means, Stanford University, 2012. [25] Mirkes E.M. K-means and K-medoids applet. University of Leicester, 2011. [26] Dyuran B., Odell P. Cluster analyze. Moscow: Statistika, 1977, 128 p. [27] Kim, J. O., Mueller, C. W., & Klecka, W. R. Factorial, discriminant and cluster analyzes. Moscow: Finansy i statistika, 1989, 215 p.