A research of classification algorithm of spatial information on the basis of methods of persistent homology and random forest S V Eremeev1, K V Kuptsov1 and Yu A Kovalev 1 1 Vladimir State University named after Alexander and Nikolay Stoletovs, Gorky street 87, Vladimir, Russia, 600000 Abstract. The classification problem of spatial data is one of the most difficult challenges in the field of the analysis and processing of spatial information. A new approach to a solution of the classification problem of spatial data is presented in article. The offered classification technology of objects will be based on algebraic topology, namely on methods of persistent homology. A barcode is a qualifier of a spatial object. It is determined by computation of topological features of a classified object. The distinctive feature of the offered algorithm is its invariance to affine and topological transformations. The research on results of classification algorithm operation on a set of spatial objects of different classes is carried out. 1. Introduction Automatic digitization of maps is one of global problems in geographic information systems [1, 2, 3]. Questions of identification [4] and classification of cartographical information appear within this problem. The problem of classification of spatial data on object classes is one of the most difficult in the field of the analysis and processing of spatial information. Russian and world researchers try to solve this problem and propose a set of application-oriented solutions. Having studied scientific works on this subject it is possible to tell that they solve a problem of object classification with various degree of efficiency. There are different methods of classification of spatial objects. The method intended for work with topographic maps of average scale is presented in [5].The main application is a classification of the area of objects under construction. The method is based on geometrical structures of data and spatial analytical methods. Advantage is improvement of quality of automation of cards with areas of objects under construction. The problem of classification of spatial data is also relevant for control of information on exhaustion of reservoirs or, on the contrary, – about their degradation [6]. The technology is applied to spatial objects which have similar spectral features, but various form. The algorithm is realized for classification of reservoirs on Alaska and also is used in Bolivia for classification of pastures. The analysis of the image is applied in [7] together with network methods of extraction of information within a problem of creation of digital tourist maps. The algorithm classifies spatial objects according to the developed rules of simplification and generalization of maps to emphasize reference points and to reduce a role of less significant objects. The technology is applied to creation of tourist maps of San Francisco. IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) Data Science S V Eremeev, K V Kuptsov and Yu A Kovalev Processing of satellite images or images with high resolution is made in [8] for classification of the objects which are contained in them. A classification is made for the main classes of objects which are presented on topographic maps of large scale. An approach using the example of digitizing distribution maps taken from plant-taxonomic atlases is described in [9]. In result, plant distributions over Europe and Asia have been digitized. The algorithm is a tool to capture data from maps based on obscure projections. The purpose of work is creation of an algorithm for classification of cartographical information which will make high-quality object classification of various spatial classes and also is invariant [10] to affine transformations and changes of scale. 2. A classification algorithm of spatial data on the basis of methods of a persistent homology and random forest The offered algorithm of classification of objects is based on algebraic topology, namely by methods of persistent homology. Application of topological characteristics and their analysis is new area of theoretical researches for tasks of the analysis and processing of spatial information. Information from aircraft is processed and analyzed. The allocated objects are distributed on spatial classes in accordance with the classification of spatial information. A barcode is taken as the qualifier of a class of a spatial object. It is formed by calculation of topological features of the classified object. Set of values of color intensity of all object points is created. Sorting of this set of values according to increase is made. Search of vertices of some intensity is run step by step. It is noted in the list of vertices when finding such point. If this vertice appeared in the neighborhood of Moore of already noted point, then they are connected a line. The triangle is formed at emergence of three such vertices. The number of components (vertices, lines and triangles) at such approach can change on each step of an algorithm. Emergence of vertice adds a component. Emergence of the line connecting different components leads to disappearance of component (two components unite in one). Pass in reversed sequence (on decrease) is the following stage of an algorithm. At the same time the number of holes and their existence time is counted. The hole is formed at emergence of a triangle. The filtration list for holes turns out depending on emergence of new components, their association and other operations. Search of the maximum number of holes and lines is made by the following step. Barcode of the image of an object is calculated on the basis of these numbers (fig. 1). (a) (b) Figure 1. Barcodes: (a) – the car and (b) – the P-shaped building. The quantity of holes and intensity of points on the color model RGB are displayed on axis X and on axis Y. A belonging of object to a spatial class is defined by comparing of barcodes of two objects. Previously training on images of objects of differrent classes is made (fig. 2, a). Comparing represents check of inclusion of Bettie numbers (the maximum numbers of holes and lines of the image of an object) in the range which characterizes objects of a spatial class (fig.3). The algorithm is complemented with the random forest method [11-13] for optimization of work of an algorithm on IV International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 166 Data Science S V Eremeev, K V Kuptsov and Yu A Kovalev time. It allows to improve speed of an algorithm. The decision tree is formed on the basis of this distribution (fig. 2, b).It is result of work of algorithm. (a) (b) Figure 2. (a) Distribution of objects on classes (an algorithm training): A1 –cars (if Y