Recognition of Manuscript Tables in Computer Processing of Technical Transport Documentation Elena Y. Bursian Anton M. Demin Alexander P. Glukhov Emperor Alexander I Emperor Alexander I Joint Stock Company Railway St. Petersburg State Transport St. Petersburg State Transport Research Institute (JSC University, University, "VNIIZhT"), St. Petersburg, Russia St. Petersburg, Russia Moscow, Russia bursianeu@mail.ru ad2271@ya.ru Lomov[Lom16],Yu.V.Vizilter[Vis12], Ya.A. Furman relate to practical application and have recognized applied value. The main foreign works are presented by R. C. Gonzalez,L.Lam [Lam95], C.Suen, T. Y. Zhang Abstract [Zha84], C. Y. Suen, D. T.Lee [Lee82],H. Blum [Blu67], R. O. Duda [Dud00], P. E. Hart, R. Shapiro, The article discusses the process of L. Shapiro, G. Stockman, P. Viola and M. Jones recognizing handwritten characters presented [Vio04],S. Rosset[Ros04], L.C.Molina [Mol02]. in tables of technical railway documentation Currently, there are computer systems for recognizing and test works of students of PGUPS. In the handwritten characters in almost noisy images: model under study, a skeleton graph is ABBYY FormReader, OmnPage, CuneiForm, constructed for each recognizable region and ReadirisPro. Moreover, the use of these systems in the the procedure for statistical processing of the recognition of specialized tables is not always characteristics of the branches of skeletal effective, since the recognized information in many graphs is analyzed.Skeletal graphs are cases has a predetermined structure. Recognition of constructed for reference symbols and are handwritten characters in technical tables on low- dynamically replaced during recognition by quality images is an urgent scientific and technical skeletal graphs of recognized areas. The task. nature of dependencies between the Handwritten tables and test documents usually assume components of skeletal graphs of reference that there are different density distributions for specific symbols and dynamically added objects is characters, but they are unknown. During computer investigated. In the process of automatic processing, the characteristics of the sample can be recognition of handwritten characters, the calculated and unknown. obtained statistical results are applied. 1 Statement of the problem Introduction To build an automatic recognition system for The task of automatic processing and optical individual handwritten characters, a set of basic recognition of a pre-scanned or captured image is recognizers is used with not always a high probability relevant in many fields of activity. Verification of of object recognition. Weak recognizers are grouped, scanned completed tables and forms, tests, and a committee of classifiers is built using the questionnaires, tests during distance learning of PSU AdaBoost algorithm. students, tables of technical railway documentation When automatically checking test works, it is requires handwriting recognition. permissible to assume that the classifier divides the Universal electronic document management systems space of vectors of informative attributes X into two and individual specialized programs are based on the sets X1 and X2, since the test usually assumes a single principles of the general theory of recognition. answer. You can also automatically build a committee Monographs by V. N. Vapnik [Vap74], of classifiers separately for each character. A.Ya.Chervonenkis, Yu.I.Zhuravlyov[Zhu05], A.B. To build a committee of classifiers, you must first Merkov[Mer14], V.V. Ryazanov, O.V. Senko. construct many different basic recognizers. The Researches of L.M. Mestetsky[Mes09], D. A construction of basic recognizers by estimating the Gavrilov[Gav19],N. A. parameters of multidimensional distribution densities Copyright c by the paper's authors. Use permitted under Creative of the vector characteristics of handwritten characters Commons License Attribution 4.0 International (CC BY 4.0).In: A. is an urgent task. Khomonenko, B. Sokolov, K. Ivanova (eds.): Selected Papers of the Models and Methods of Information Systems Research Workshop, St. Petersburg, Russia, 4-5 Dec. 2019, published at 10 http://ceur-ws.org 2 Construction of basic recognizers and The set of points P belonging to the skeleton representation is also called the skeleton of the region final classifier D. We can assume that the skeleton of the region is the After scanning the document, the image is reduced to a set of centers of maximal circles lying in the region two-gradation view. Figure 1 shows a scanned image (Figure 2). of a table of railway documentation. Based on the region’s skeleton, for each recognized object, the characteristics of the loaded graph, called the region’s skeleton graph, are calculated. The skeletal Figure 1: Railway Documentation Table graph of a recognized object can be represented as follows (Figure 2, Figure 3). It should be noted that a modern approach to maintaining railway documentation requires the mandatory introduction of electronic document management [*]. For each recognized area, a skeletal description is constructed. The calculation of the characteristics of the skeletal representation of the region is based on the following definition. The point P belongs to the skeletal representation of the domain D if and only if the following statement holds: 𝐡" (𝑃) βŠ‚ 𝐷 & βˆ„π΅"+ (𝑃+ ) βŠ‚ 𝐷: 𝐡" (𝑃) βŠ‚ Figure 2: Skeletal representation of the area 𝐡"+ (𝑃+ )&𝐡" (𝑃) β‰  𝐡"+ (𝑃+ ), where Br(P) is a circle centered at point P and radius r [Dud00]. 11 The regression of the slope angles of the skeletal graph of a symbol from the values of the corresponding angles taken at previous intervals can be calculated using multivariate regression analysis methods, creating a system for determining unknown regression coefficients. (+) (+) (+) (+) ⎧ πœ‘3 = π‘Ž3 + π‘Ž+ πœ‘+ … π‘Ž8 πœ‘8 + πœ€ (+) βŽͺ (:) (:) (:) (:) πœ‘3 = π‘Ž3 + π‘Ž+ πœ‘+ … π‘Ž8 πœ‘8 + πœ€ (:) Figure 3: Skeletal graph of the area ⎨ … … … βŽͺ (;) (;) (;) (;) For each branch of the skeletal graph, a vector of βŽ©πœ‘3 = π‘Ž3 + π‘Ž+ πœ‘+ … π‘Ž8 πœ‘8 + πœ€ (;) informative characteristics is constructed, made up of the slope coefficients of the edges of the skeleton graph We assume that 𝝋= = (πœ‘+ , … πœ‘8 )is the vector of or directly the slope angles of the edges of the skeleton regression factors, in the training set for the symbol graph. In the case when the symbols are written in one with number i the regression factors take the values: (?) (?) hand, between the values of the slope angles of the πœ‘+ , … πœ‘8 and correspond to the angles of inclination (?) edges of the skeletal graph taken at equal intervals, of the edges of the skeletal graph of the symbol,πœ‘3 is = there is a statistical dependence. the response value. 𝒂 = (π‘Ž3 , π‘Ž+ , … π‘Ž8 ) – estimated regression parameters,𝜺= = (πœ€ (+) , … πœ€ (;) )error vector, The number of unknown regression parameters does not exceed the number characters in the training set k