Description Models, Methods, Algorithms, and Technology for Processing Poorly Structured Raster Graphic Documents* Dmitry Vasin ITMM, Federal State Autonomous Educational Institution of Higher Education, National Research Lobachevsky State University of Nizhny Novgorod, Russia dm04@list.ru Abstract. The original, highly effective technology for processing large format, complex structured graphic documents with a poorly formalized description of objects based on the original low-level raster and vector representation models is described in the paper, as part of the further development of the combinatorial geometric approach to processing geographically distributed data. We propose the original data structures and algorithms developed on the basis for solving actual problems of preprocessing, compression, and raster images recognition of graphic documents of the type. Keywords: Raster Data Representation Models, Large-format, Complex Structured Graphic Documents, Hyperspectral Earth Remote Sensing (ERS) Raster Data, Compression of Raster Data, Pattern Recognition, Automation of Drawing and Design Documentation Processing, Raster Graphic Data Processing Technology. 1 Introduction The information technology (IT) industry is one of the most dynamically developing industries in Russia and all over the world. In the paper [1], a unified system approach of the state to the development of IT industry was formed as the most important priority of Russia's development for the period of 10-15 years, which defines “the transition to advanced digital, intelligent production technologies, robotic systems, new materials and methods of construction, the making systems for processing large amounts of data, machine learning and artificial intelligence”. The section devoted to fundamental, search and applied research highlights technologies for monitoring and forecasting the Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). * The work is performed with financial support of Russian Science Foundation (Grant No. 16- 11-00068) and the Russian Foundation for Basic Research (Grant No. 18-07-00715). 2 D. Vasin state of the environment, detecting and eliminating consequences, as well as conducting rescue operations in man-made and natural disasters, space technologies, technologies for acquisition, processing, storing, searching, recognizing, data mining and extracting knowledge from large amounts of data, information, including new methods, algorithms and software on this issue. To a large extent, intelligence of advanced production technologies and information systems is associated with the expansion and qualitative change of such a channel for obtaining information as video data. The accumulation of a large volume of the data is typical for safety ensuring, preventing and assessment damage from emergencies, in the course of activities of various industries, agriculture, science, health and social services. The basic foundation for prevention of emergency situations (ES) is the monitoring and management based on the collection and use of a wide range of spatially distributed data (SDD) and ERS data from spacecrafts. Obtaining and efficient use of the purchased at enormous volume and variety of information requires the solution of a complex of important scientific and technical tasks associated with the development of methods, algorithms and breakthrough GIS integration technologies, storage, transmission, analysis and recognition of a wide spectrum of ERS data (including the latest hyperspectral data), as well as accompanying various thematic information of spatially distributed objects for building intelligent GIS for integrated monitoring, prevention, evaluation and minimization of consequences of ES, improving the efficiency of managing economic and social processes that are geographically linked. At the same time, development of new types of GIS and navigation systems are among the priority research areas that determine the vector of technological development in the field of IT technologies, in which it is possible to increase competitiveness in the relatively short- term outlook. A significant subclass of large format, complex structured graphic documents (LFGD) formed by common types of documents as engineering drawings, diagrams, floor plans for buildings, raster images of ERS (RIERS), including the latest hyperspectral data raster images (HSDRI), and etc. They are usually executed manually on a hard copy of medium or poor quality, or obtained from space sensors and have low contrast. They have a significant variety of sign content, a large density of drawing graphic signs on the surface of the LFGD, serious deviations from the standard image of graphic objects, their arbitrary orientation, consist in a variety of topological relationships with each other, and etc. Depending on the quality of the LFGD medium and the right choice of digital capture parameters, the degree of noise pollution of raster image (RILFGD), as well as metric accuracy of separate graphic symbols, varies in a wide range of values. These features make it possible to distinguish the LFGD into a separate subclass of graphic documents with a poorly formalized description of objects (PFGD) and cause increased demands for their geometric modeling. Currently, one of the most popular types of PFGD are images obtained from the ISS. The global trend of the ERS development is use of hyperspectral systems with unique capabilities for detecting and analyzing detailed properties of objects under observation. These systems provide an effective solution to a wide range of tasks of operational monitoring of the earth's surface in various fields of human activity: hydrometeorology, Description Models, Methods, Algorithms, and Technology for Processing Poorly 3 cartography, geology, forecast, detection and monitoring of ES, environmental analysis, etc. However, by now, experts have identified a number of unsolved scientific, technical and organizational problems that significantly hold back implementation of ERS data in the innovative economy, science and education [14], which put forward increased requirements for time and space capacity of algorithms of preprocessing, searching, storing and complex analysis of the HSDRI and heterogenous thematic graphical and semantic information of the SDD, require the development of new effective models, methods, algorithms and implementation of high-capacity software and technologies for processing and analyzing heterogeneous information, object-oriented DBMS with information compatibility and intellectualization of processing. Thus, it is actual to further improve methodological, algorithmic and software basis for automated processing of raster images of PFGD (RIPFGD). The purpose of the work is to further develop mathematical models of description, methods and effective algorithms for solving a wide range of problems of processing RIPFGD and based on them making appropriate information, software and technological support for automated systems for processing PFGD. The investigation is concerned with naturally diverse PFGD, formed by various technical devices and systems, as well as in raster images obtained as a result of converting paper archival documents belonging to the PFGD class into a digital form. The subject of the research is the processes of automatic and automated processing of information of the type. 2 Problems of creating automatic technologies and systems for processing PFGD In recent years, to automate the technology of entering PFGD various information technologies based on heuristic procedural methods, as well as on the basis of recognition methods with a teacher, that are effective for a limited set of objects with a strict restriction on their size and orientation, have been proposed. We note that the technology of automatic analysis of PFGD is a complex, multistage process that includes a large number of processing methods and algorithms: filtering, compression, storage and search, analysis and decision making. It is very important that all the mathematical models, methods, algorithms and data representation structures that make up the technology are interconnected and mutually effective, since it is obvious that any high efficiency at any particular technological stage of processing can be negated at other stages. The huge information redundancy of RIPFGD certainly increases the requirements for automatic algorithms for processing them. The problem gets worse if we take into account that automatic processing of RIPFGD at the lower levels of the hierarchy must be carried out in real time and with limited memory resources, and the developed models and methods must be integrated into existing technologies and systems. Therefore, the developed models and methods of processing RIPFGD should be technological and meet general requirements that are predetermined by effectiveness of 4 D. Vasin solving the problems of graphical information (GI) analysis in general [2]: • technological effectiveness; • high efficiency in capacity and time; • natural integrability in the general scheme of processing. In order to avoid multiple duplication in the development of systems that solve various problems of processing PFGD, it is reasonable to pick out a basic system that can be considered as an implement for solving two main problems: on the one hand, it is the basis for the development of systems specialized for a specific subject area, and on the other hand it is an automated workplace for the development and research of algorithms for processing PFGD. The combinatorial geometric approach (CGA) to processing RI SDD, proposed in the 80's, can be put at the core of such a system. It is based on a hierarchy of mathematical models for image description, a hierarchy of data representation structures, a set of time efficient and memory efficient algorithms for solving computational geometry problems, as well as specialized algorithms for processing video data [3]. The pithy point of the approach is this: a vector model is built based on the original RI SDD, i.e. the set of points, polygons, and polylines corresponds to the original RI SDD. Based on this representation, a hierarchy of interrelated mathematical models of description, representation structures, and decision making is formed, in which objects are also considered as points, polygons, polylines, and their aggregates. Construction of objects in the hierarchy of image models is performed using a system of logical geometric predicates (decision rules) that calculate characteristics and relationships between objects: dimensions, distances, enclosures, adjunctions, intersections, and other types of relationships of mutual location, shape characteristics of objects and their parts. For higher level models, effective methods of computational geometry were developed (the method from the general to the special based on hierarchical structures for representing vector data), methods for recognizing graphic objects (correlation-extreme method), and etc. [4–6] As a result, the entire complex set of tasks related to video data analysis is considered from a single point of view of building a hierarchy of interrelated mathematical models of description, representation and decision making structures, the lower level of which processes raster information from the original source of visual data, and the upper level corresponds to the description of graphical data at the content level in terms of the user. Therefore, an important task to further enhance the intellectualization of information technologies to automatic processing RIPFGT that led to the need for further development of the CGA and a hierarchy of models of description and created on their basis of methods and algorithms for processing RIPFGT. Description Models, Methods, Algorithms, and Technology for Processing Poorly 5 3 Effective low-level models for the representation of RIPFGD In the framework of the CGA, the mathematical model of the image will be understood    as the triple view M = E , C , R , where E = e1 , e   2 ,..., es — a set of non-   derivative elements (NE) of the model of rank α; R = r1 , r2 ,..., rt — a set of admissible relations between non-derivative elements of the model of rank α;  C = c1 , c  2 ,..., cn — the set of characteristics of non-derivative elements of the model of rank α = 1, 2, 3, ..., N. ( Image P in the model M  is a collection of image objects O = O1 , O2 ,..., Ok ) , where each Oi , i = 1, 2, ..., k is a subset of non-derivative elements (NE) E , = 1, 2, ..., s , which have a set of characteristics from C , = 1, 2, ..., n and are linked by a relationship from R , = 1, 2, ..., t with other objects from O , = 1, 2, ..., k [2, 3, 7, 8,12]. A higher rank model is built from a lower rank model by creating new NE from the NE model of the lower rank when the sets R and C . Class raster models PFGD (RMPFGD). The primary model for the representation of RIPFGD is the raster pixel model (PMI), which has significant information redundancy. For binary raster images of PFGD (BRIPFGD) in order to reduce their information capacity, it is proposed to use a raster-vector stroke model (SMI), where a stroke is a one dimensional cluster of single color connected pixels along the line. Formation of the strokes of one of the two original colors is necessary and sufficient for BRIPFGD. We note that for BRIPFGD, the SMI is on average L-times more compact than the PMI, where L is the average stroke length on a given RI. Similarly to PMI, the conditions defining the topological properties of strokes are defined for SMI [2, 7, 8, 12]. The practice of processing PFGD, especially taken from the archives, and HSDRI caused new serious problems associated with the transition from the lower (pixel) representation to the vector that led to the need for further development of the hierarchy RMPFGD where the large volumes of data processed and time costs. The analysis of the topology of the strokes BRIPFGD, class RMPFGD was expanded. It is proposed to allocate special strokes: isolated (Si), the beginning (Sb)/ end (Se) of a raster object (RO), splitting (Ss)/merging (Sm) RO and splitting and merging (Ssm) RO. Strokes of the type Ss, Sm, Ssm are called nodal [2, 7, 8, 12]. The introduced classification of graphic situations allows you to expand the hierarchy of the RMPFGD and introduce new object models: raster simple object (RSO) and raster composite object (RCO), while RSO is allocated only based on the bar description, and RCO is based on both the bar and pixel descriptions of the BRIPFGD. An important stage of automatic BRIPFGD processing, which allows improving the quality of the generated vector models, and, as a consequence, subsequent automatic 6 D. Vasin recognition procedures, is segmentation of the original BRIPFGD by a geometric criterion. When setting each pixel in the original BRIPFGD features extensive connectivity, i.e. the minimum length of the stroke obtained in all possible directions scan line passing through that pixel, through the use of a simple threshold scheme can be segmented set RSO on linear (RSOL) and areal (RSOA): if for the current pixel RSO the condition Lsw  Por is fulfilled, then this pixel belongs to RSOL, otherwise RSOA, where Por is a selected threshold value. This, in turn, allows us to move to the linear- area model (LARMI), that is representation of the original BRIPFGD as a logical sum of two: one of which consists of RSOL pixels, and the second is from RSOA pixels. Thus, the extended class of RMPFDD: PMI, SMI, RSO model, RCO model, and LARMI. [2, 7, 8, 12]. Fig 1. Structured description of BRIPFGD The proposed hierarchy of RMPFGD allowed for a structured presentation of BRIPFGD. In this case, it is represented by a set of coherent raster components (CRC): RPO, RCO, as well as start/end of RO and nodal strokes. A lot of NE consists of RSOL, RSOA, Sb, Se, Sm, Ss, Ssm. The RPO is adjacent to the strokes (Fig. 1). The built hierarchy of the RMPFGD allows us to: • structuring RIPFGD; • parallel processing of RO; • a class extension methods of processing RIPFGD used on the lower levels of description • using not only local, but also integral processing criteria; • recognition of linear, areal, and discrete RO; • reduced capacitive and computational complexity of processing algorithms RIPFGD. Description Models, Methods, Algorithms, and Technology for Processing Poorly 7 The class of vector models PFGD (VMPFGD). We propose an improved class of vector-level models as a further development of the RMWFGD hierarchy presented above. The class of vector models PFGD (VMPFGD). We propose an improved class of vector-level models as a further development of the RMPFGD hierarchy presented above. Meanwhile, in addition to the mentioned contour (CMI) and linear–contour (LCMI) models, a linear (LMI) and, derived from it, a segment-node model (SNMI) is proposed. Their detailed description is given in the works [2, 7 – 10, 12]. The use of SNMI is particularly effective when processing PFGD with a pronounced topological load on linear geometric elements of the image. Topological feature such RIPFGD is a massive presence on them straight RSOL forming a mutual intersection. Meanwhile, the topological model is determined by the presence and storage of sets of relationships, such as the connection of arcs at intersections, an ordered set of segments that form the border of each contour, and so on, and topological properties of figures do not change under any deformations produced without breaks or connections [9, 10]. Unlike object vector models of PFGD, SNMI does not contain objects in the usual sense, such as contours, vectorized lines, segments, and etc. The transition from SNMI to the level of object vector models is a separate task. CMI, LCMI, and LMI are associated with various SRC vectorization algorithms and provide geometric interpretation of images in scene analysis and recognition tasks, as well as a metric description of raster information components. Thus, the extended class VMPFGD consists of KMI, LCMI, LMI and SNMI patterns. [2, 7, 8, 12] We note that the VMPFGD obtained through automatic procedures form objects that do not always correspond to their reference description and have specificity, both in their composition and in the ways they are set. In addition, practice of processing PFGD, especially those taken from archives, has revealed new serious problems related to the transition from the lower (pixel) level of representation to the vector one. This further leads to a sharp complication of automatic object recognition procedures on the PFGD and an inevitable decrease in the time efficiency of the entire processing technology for this class of documents, which is associated with mandatory interactive control and editing of possible errors. This, in turn, actualizes making of modern, intelligent, interactive graphical tools to support the creation of the VMPFGD. 4 Geometric modeling PFGD In the case of stroke form of representation of BRIPFGD its geometric modeling in the form of CMI was carried out by a modified algorithm [2], in which through the use of information about RSO and RCO has managed to reduce its space complexity 4-5 times, and the temporary 2-3 times compared to the original algorithm. [7] LMI is formed by the RSOL by approximating their axis lines using the method of least squares.[9] LCMI is based on LARMI, while linear elements of LCMI are formed by RSOL, and contour elements are formed by RSOA. Further application of the original methods of discrete geometry to the lines and contours [5, 6, 13] allows us to synthesize a full- 8 D. Vasin fledged LCMI. The time characteristics of these algorithms are similar to the algorithms for constructing CMI. In the future development of the CGP, the original hierarchical representation models raster and vector SDD and the development of existing methods and algorithms, developed an effective technology for building the geometric model (GMPFGD) on aggregate of RCO BRIPFGD, providing a high level of accuracy when the semantic decoding of the document [12]. The input of the proposed technology is BRIPFGD, each pixel of which: 1, 𝑖𝑓 𝑡ℎ𝑒 𝑝𝑖𝑥𝑒𝑙 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑎 𝑠𝑖𝑔𝑛 𝑙𝑎𝑦𝑒𝑟 𝑅={ 0, 𝑖𝑓 𝑡ℎ𝑒 𝑝𝑖𝑥𝑒𝑙 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑 𝑙𝑎𝑦𝑒𝑟 We will form the required GMPFGD as a set of sets of NEs: topological nodes U, segments S and contours K: U = {Ui}, i=1, 2,…, Nu; S = {Si}, i=1, 2,…, Ns; K = {Ki}, i=1, 2,…, Nk. Figure 2 shows these elements against the background of the BRIPFGD, where: • areas 1 define background raster layer; • areas 2 define signed raster layer; • lines and nodes 3 define topological nodes and short segments; • lines 4 define segments of axis lines; • lines 5 define contours of area objects. Fig 2. Non-derived elements of the geometric image model Two-layer model BRIPFGD and selected GMPFGD make it easy to extend existing methods and algorithms for vectorization together RCO documents, including full color RIERS and HSDRI, which have a color (spectral) separation (clustering). Description Models, Methods, Algorithms, and Technology for Processing Poorly 9 Structural analysis of the symbolic layer BRIPFGD shows that it contains the following RSO: • noise of fairly small geometric dimensions («snow»); • small-sized images that represent elements of a set of discrete characters; • large-size, representing isolated linear and areal signs, or a conglomerate of the results of overlaying (merging, touching) linear, discrete and areal signs. To maintain the full variety of algorithms of classifying the sign content of graphic documents, the set of vector elements obtained in this way is supplemented for the current document with a description of its sign pixel layer in stroke format, which allows us to construct algorithms for recognizing signs based on their combined, consistent, synchronous vector and pixel descriptions. This feature distinguishes the proposed methods and algorithms describe RIPFGD from existing ones. Figure 3 contains a fragment of the GMPFGD topography map with an expressed load of linear objects obtained by applying this technology. Fig. 3. Fragment of the GMWFGD terrain map Use of the technology for the formation of GMPFGD type engineering drawings allowed to automatically identify areas of topological interaction of various objects. This made it possible to automate the search for possible erroneous drawing areas in the sense of geometric interaction of objects (unauthorized breaks/intersections of linear objects, lack of interaction or unauthorized interactions between linear objects and curves (arcs of curves) of the second order). In this case, possible “areas of interest” are automatically selected, which are then presented to the operator for making a final decision during the interactive analysis of these areas in a particular graphical editor. In considered technology the file interface with CAD “Compass” [11] was provided. In general, according to the author, it is justified to use automated systems for processing PFGD, in which relatively simple but mass operations are performed automatically, but the final decision making is entrusted to an operator. The effectiveness of this technology on average is determined by the fact that the detailed interactive analysis is not subjected to the entire PFGD, but only its individual fragments. Due to parametric tuning of algorithms and due to the possible segmentation 10 D. Vasin of the original RIPFGD on a linear and areal objects, the process of finding errors can be iterative. If BRIPFGD of vectorized object does not contain distortion and noise, the existing local algorithms vectorization satisfactorily cope c problem of automatic formation GMPFGD, although it is well to bear in mind that the resulting vectorized objects require additional smoothing or approximation, a is not always possible to do successfully, that is, at the same time meet the requirements of a metric approximation accuracy and the geometric accuracy of the vectorized object [2, 3, 7-10, 12]. 5 Automatic character recognition on the PFGD The peculiarity of the PFGD is that, despite the deviation from the regulatory requirements for the image of objects, they have a certain stylized form of representation. This means that it is possible to create a set of object standards based on working with low-level models of graphic images that are maximally adapted for this class of documents, taking into account the strong dependence of the effectiveness of classification features on input data distortion. In the future, as new documents become available, this set may be supplemented accordingly. With this in mind, an effective character recognition technology was developed based on the original low-level raster representation model of RIPFGD. As the discriminant of the signs of recognition (DSR) used: number of interiors; ratio of width to height is described around the symbol of a rectangle with sides parallel to the coordinate axes; the ratio of the pixel area of the symbol to the area of the enclosing rectangle; RSO which is calculated: the number of strokes included in the RSO and the average length of the current RSO; special touches (Sb, Se, Ss, Sm); the coordinates of the center of gravity of the RCO that forms the character; the values of moment invariants. At the same time, moment invariants are the most important tool for pattern recognition that is invariant with respect to affine transformations. Their insensitivity to image rotations makes their use effective as features in the task of detecting and recognizing objects of unknown orientation in the image [15-19]. The specified set of DSR is not final and can be expanded depending on the type of PFGD being processed. A hierarchical system of decisive rules for classifying symbols was synthesized on the basis of the DSR, which allowed to significantly increase the efficiency of their automatic identification on the PFGD [17 – 19]. Practical experiments on character recognition in various PFGD in English and Russian have found recognition quality of at least 97% - 99%. The reduced recognition quality, depending on the type of PFGD, is explained by the presence of "scattered" symbols and the overlapping/joining of characters to thin linear objects. If there is a mass presence of this type of interference, before starting automatic recognition, it is necessary to either use specialized automated tools for correcting the PFGD, or edit the document interactively using a graphic editor. In general, it can be argued that the values of moment invariants are fairly stable. After using them, there are 2 to 5 possible classes that the recognized character can Description Models, Methods, Algorithms, and Technology for Processing Poorly 11 belong to. For the final decision on assigning an object to a certain class, the features obtained from the dashed description are used, which also have sufficient stability [17 – 19]. 6 Regular encoding methods of HSDRI Most modern ERS systems are multi-channel, where multi-channel mode refers to the formation of images of the same area of the surface using multiple frequencies, polarizations, viewing angles, etc. One of the most promising types of multi-channel ERS systems is hyperspectral imaging, which "overlap" the optical and near-infrared ranges of electromagnetic waves with a spectral resolution of the order of units of nanometers and a spatial resolution of units to tens of meters, forming simultaneously hundreds of practically combined HSDRI. Examples of such systems are AVIRIS, HYDICE, Hyperion, CASI, CHRIS-PROBA, and others [20]. When transmitting, storing and processing HSDRI, the central problem is the huge amount of information data that needs to be transmitted through communication channels and processed [20-26]. Therefore, the actual task of developing the existing and finding new methods of compression of HSDRI. When encoding large amounts of experimental data, methods based on variance and factor analysis are widely used. At the same time, there are basic functions that are in some sense adapted to encoding the data in question. The basis functions obtained by the method of orthogonal components are optimal in the sense of the average mean- square error, and when the encoded data is pre-normalized in terms of duration and energy, they are optimal in the sense of the minimum expansion coefficients [27]. However, from a practical point of view, these methods are quite computationally and require certain memory resources, since it is necessary to calculate the eigenvectors of covariance matrices obtained from a set of HSDRI. Currently, there are two classes of compression algorithms: lossless source information and one with losses, which provide a slightly higher compression coefficient compared to the methods of the first group. At the same time, the degree of distortion of the source data is determined by the accuracy of approximation ε set as a parameter. Lossy compression methods are based on the idea of decomposing the source signals according to a particular system of basic functions (SBF) with a given approximation accuracy ε. At the same time, the problem of optimal encoding of HSDRI is reduced to the search for a SBF that, given the standard error δ, provides the minimum or close to it number of such functions  1 (t ),  2 (t ), ..., ,  m (t ) . Then the process f (t) (t1 ≤ t ≤ t2) ~ m can be approximately represented as: f (t ) =  C k  k (t ) of basic functions k =1  1 (t ),  2 (t ), ..., ,  m (t ) Coefficients C1, C2, ..., Cm are considered as the code of the ~ curve f(t). Approximation error:  (t ) = f (t ) − f (t ) . It is obvious that different types of hyperspectral images (HSI) will require different optimal SBF. 12 D. Vasin In order to reduce the computational complexity of the HSDRI compression algorithm, a quasi-optimal lossy compression algorithm is proposed based on the formation of "well-adapted" basic functions. This method provides an encoding error that is not greater than the one specified for all points of the convex hull of the original set of vectors, while methods based on the ideology of the main components provide a sufficiently small error on average for the entire source set. The proposed method allows for a fairly simple practical implementation for large dimensions of the source data [28]. We note that when encoding is usually not as important possible greater accuracy of approximation ε with a given number m of basis vectors, how to minimize the number m of basis vectors for a given precision of approximation ε. It is proved that for a given accuracy ε, the "well-adapted" basis does not include at most the last three orts compared to the optimal encoding on average [29]. Practical experiments on real sets of HSDRI have shown that the use of this method allows you to bring the compression ratio to 90-95% with adequate preservation of the quality of the restored HSDRI. 7 Conclusion The results of the work: • a new approach to the compression RIPFGT type of HSDRI remote sensing- based quasi-optimal “well-adapted” SBF is proposed and it allows us to compress HSDRI mode control-wise specified maximum error, with a substantial reduction in computational complexity compared to the classic principal components analysis; • a DSR system based on the original hierarchical models for describing the raster level and a hierarchical system of decisive classification rules for structural methods of character recognition on RIPFGD of various nature is proposed, which allows significantly reducing the computational complexity of the developed recognition algorithms in comparison with classic analogues; • classification of the most common metric and topological errors on RIPFGD type 2D drawings of projection types of engineering parts is proposed; • the technology of automatic conversion of RIPFGD type 2D drawings of projection types of engineering parts into a topological vector representation is proposed to solve the problem of automatic conversion of the original (paper) set of 2D drawings of projection types of engineering parts into its 3D model. The technology is based on original models of RIPFGD representation and algorithms that have significant computational efficiency; • an automatic technology for quasi-optimal encoding of HSDRI remote sensing based on a “well-adapted” SBF is proposed, which allows to obtain record compression coefficients (up to 95%) with adequate preservation of the quality of restored rasters. Developed methods and algorithms for adaptive compression of HSDRI are new and correspond to the world level; Description Models, Methods, Algorithms, and Technology for Processing Poorly 13 • on the basis of the original DSR system and the hierarchical system of decisive classification rules, an automatic symbols recognition technology for RIPFGD of various nature is proposed, which allows automatic recognition of up to 98% of objects with high computational and time efficiency. References 1. “Strategy of scientific and technological development of the Russian Federation”, approved by decree of the President of the Russian Federation from 01.12.2016 № 642 2. Vasin D. Yu. Research of Description Models, Development of Algorithmic, Software and Technological Support for Processing Raster Images of Graphic Documents // dissertation for the degree of candidate of technical Sciences / Nizhny Novgorod, 2006 (in Russian) 3. Vasin Yu. G., Bashkirov O. A., Chudinovich B. M. Combinotary Geometric Approach in Complex Graphic Data Analysis Tasks. // Automation of processing complex graphic information: inter-University collection of Scientific Papers. / Under the editorship of Vasin Yu. G.: Gorky: Gorky State University, 1987 (in Russian) 4. Lebedev L. I. Correlation-Extreme Contour Recognition Methods. Theoretical foundations: Textbook.— Nizhny Novgorod: publishing house of Nizhny Novgorod state University, 2013. — 113p. (in Russian) 5. Vasin Yu. G. Models, methods, tools for creating, storing, processing and analyzing spatially distributed data in GIS in sat.: GRAPHON 2016. Proceedings of the 26th International scientific conference. 2016. P. 14-21. (in Russian) 6. Vasin Yu. G., Utesheva T. Sh. Improving the efficiency of technological processes for processing and analyzing graphical data in GIS. In collection: GRAPHON 2015. Proceedings of the 25th Anniversary International scientific conference. 2015. P. 68-71. (in Russian) 7. Vasin Yu. G., Bashkirov O. A., Rudometova S. B. Mathematical Models for Structured Description of Graphic Images. // Automation of processing complex graphic information: inter-University collection of scientific papers. / edited by Vasin Yu. G.: Gorky state University. Gorky, 1984 (in Russian) 8. Vasin D. Yu., Gromov V. P., Rotkov S. I. Models of Representation of Raster Graphic Documents with a Poorly Formalized Description of Objects. // In collection.: Proceedings of the international conference on computer graphics and vision GraphiCon. 2018. no. 28. P. 337-347. (in Russian) 9. Vasin Yu. G., Vasin D. Yu., Gromov V. P., Rotkov S. I. Robust Vectorization of Graphic Documents with Distinctive Orientation of Linear Objects. //In: [Proc. of International Scientific Conference in Computing for Physics and Technology CPT2018. 2018, pp. 313- 317. (in Russian) 10. Vasin D. Yu., Gromov V. P., Rotkov S. I. Formation of a Segment-Node Model of Graphic Documents with a Pronounced Orientation of linear objects // In collection: SCVRT2018 International Scientific Conference of the Moscow Institute of Physics and Technology (state University) Institute of Physical and Technical Informatics. Proceedings of the International Scientific Conference. 2018. P. 265-280. (in Russian) 11. Vasin D. Yu., Rotkov S. I. Automatic detection of Geometric Errors on Engineering 2D Drawings when Forming Electronic Archives. // Privolzhsky scientific journal. N. Novgorod: State University of Architecture and Civil Engineering, 2015.-№3(35). P. 68- 81 (in Russian) 12. Vasin D. Yu., Gromov V. P., Rotkov S. I. Geometric Modeling of Raster Images of Documents with Poorly Formalized Description of Objects // In collection: CEUR Workshop Proceedings. ITNT 2019 - Proceedings of the 5th Information Technology and Nanotechnology 2019: Image Processing and Earth Remote Sensing. 2019. Pp. 358-365. 14 D. Vasin 13. Vasin Yu.G., Krakhnov A.D., Utesheva T.Sh. Descrete Geometry Methods in Processing Complicted Graphical Information // Pattern Recognition and Image Analysis (Advances in Mathematical Theory and Applications). 2006. Vol. 16. No. 1. Pp. 104-105. 14. Perspective Information Technologies of Remote Sensing of the Earth: Monogr. / ed. By V. A. Soyfer. – Samara: New technique, 2015. — 256 p. 15. J. Hull, “Document Image Skew Detection: Survey and Annotated Bibliography,” Document Analysis Systems II, World Scientific, pp. 40–64, 1998. 16. Dan S. Bloomberg, Gary E. Kopec and Lakshmi Dasari. Measuring document image skew and orientation. // Xerox Palo Alto Research Center. – Access mode: http://www.leptonica.com/papers/skew-measurement.pdf, free. 17. Vasin D. Yu., Redkin M. A. Automatic Character Recognition Based on the Structural Model of Raster Binary Images. // Proceedings of the International Scientific Conference “Situation centers and ИАС4i for Monitoring and Security” 21-24 November 2016, Protvino, Tsargrad, Moscow region, Russia, P. 44-50. 18. Vasin D. Yu., Redkin M. A. Character Recognition on Large Format Bitmap Images of Documents with a Poorly Formalized Description of Objects. // Proceedings of the 27th International Conference GraphiCon 2017, September 24-28, Perm, Russia, P. 303-308. 19. Vasin Yu.G., D.Yu. Vasin, V.P. Gromov. Intellectual Information Technology for Symbol Extraction from Ill-sructured Graphical Documents // Proceedings of the International Conference on Pattern Recognition and Artificial Intelligence, Montreal, May 13-17, 2018. Published by CENPARMI Centre for Pattern Recognition and Machine Intelligence Concordia University, Montreal, Quebec H3G 1M8, Canada, pp. 724-732. 20. Chang Chein-I. Hyperspectral Imaging: Techniques for Spectral Detection and Classification / Chang CheinI. // Plenum Publishers. – N.Y.: Kluwer Academic. – 2003. – 370 p. 21. Popov M. A. Methods for Optimizing the Number of Spectral Channels in the Problems of Processing and Analysis of Earth Remote Sensing Data M. A. Popov, S. A. Stankevich // Modern Problems of Remote Sensing of the Earth from Space. - Moscow: IKI RAS, 2006. - Issue 3, T. 1. - P. 106-112. 22. Lukin V. Processing of Multichannel RS data for Environment Monitoring, Proceedings of NATO Advanced Research Workshop on Geographical Information / V. Lukin // Processing and Visual Analytics for Environmental Security. – Trento, Italy, Springer Netherlands. – July 2009. – P.129-138. 23. Kaarna A. Compression of Spectral Images / A. Kaarna / / Vision Systems: Segmentation and Pattern Recognition Ed. By G. Ohinata and A. Dutta. – Vienna: I-Tech, 2007. – P. 269- 298. 24. G. Yu. Image compression systems on board satellites / G. Yu, T. Vladimirova, M. N. Sweeting / / Acta Astronautica. - 2009. - Vol. 64. - P. 988-1005. 25. Ponomarenko N. N., Automatic Approaches to OnLand/OnBoard Filtering and Lossy Compression of AVIRIS Images / N. N. Ponomarenko, V. V. Lukin, M. S. Zriakhov, A. Kaarna, J. Astola // Proceedings of IGARSS. – Boston, 2008. – Vol. III. – P. 254-257. 26. Motta G., Compression of hyperspectral imagery / G. Motta, F. Rizzo, and J. A. Storer // Proceedings of Data Compression Conference. – 2003. – P.333-342. 27. J. Tu. Principles of Pattern Recognition / J. Tu, R. Gonzalez. - M.: Mir, 1978-411 p. 28. D Yu Vasin, V P Gromov, P A Pakhomov Elimination of information redundancy of hyperspectral images using the “well-adapted” basis method // Journal of Physics: Conference Series 1368 (2019) 032025, doi:10.1088/1742-6596/1368/3/032025 29. Yu. I. Neymark. Encoding Large Amounts of Information in Connection with Pattern Recognition Problems. / Yu. I. Neimark, Yu. G. Vasin, News of higher educational institutions. – Radiophysics. - 1968. - №7. - P. 1081-1086