1. Introduction

Journal of Physics: Conference Series

1246-0125

10.1016/j.procs.2013.09.301

Hybrid client‐server implementation and microservice architecture of automatic documentation analysis software

Anastasia A. Dzyubanenko

Alexey V. Rabin

0 0 Saint-Petersburg State University of Aerospace Instrumentation, SUAI , 67, Bolshaya Morskaia str., Saint- Petersburg, 190000 , Russia

2021

2022 0000 0001

An approach to the construction of an adaptive architecture for enterprise software has been developed in order to increase the efficiency of automated processing of documents using semantic and cognitive technologies. The proposed approach takes advantage of the existing methods of organizing the architecture of applied software. It is substantiated that the architecture of the developed software for automatic cataloging should have a hybrid clientserver implementation, including elements of modular and microservice architecture. It is shown that a significant reduction in the costs of cataloging, checking the completeness and inventory of documentation, as well as an increase in the quality of design are provided through the semantic analysis of documentation using a knowledge base that is updated automatically.

1 Weakly structured information automatic analysis of documentation semantic and cognitive technologies data cataloging

1. Introduction

3. Ability to localize the location of the error, which allows, with a good organization of modules, to correct defects in one module, causes errors in another module. 4. Fast recompilation while fixing the error. 5. Ability to reuse modules. 6. Tools are provided to solve each processing task. 7. High resiliency due to redundancy of critical services.

Availability of analytics tools, so it is easy to track dependencies between services [4]. The developed architecture includes elements from the hybrid architecture (Figure 1): 1. Element "Client". 2. Element "Application Server". 3. Element "Data warehouse". 4. Element "Complex of microservices".

Client Client Client

Application Server Container

Core Container Container Container

Processing module

Analysis Module Data Integration Module Registry of services Internal

Services

External Services Database Knowledge base LOD

2. Development of modules for hybrid client‐server implementation

2.1. Element "Client"

The client module interacts with personnel and provides data in various formats: 1. Tabular. 2. Graphic. 3. Graph. 4. Text [5].

To solve visualization tasks, built-in components for working with knowledge graphs are used, data is provided in each of the required formats with the ability to navigate the knowledge graph. Also, to solve analytic tasks, components for working with multidimensional data are used. The graphical interface is also designed for automated structuring of knowledge with the participation of subject matter experts [6]. 2.2.

Application server element description

The application server is built on a modular basis and includes: 1. The core. 2. Modules of data analysis. 3. Modules of data processing.

4. Modules of data fusion. 2.3.

Description core

The flexible core is the central component of the software being developed; it interacts with the rest of the modules and processes user requests from the graphical interface. The kernel, in the process of functioning, processes user requests, interacts with data stores and provides the user with requested samples or calls processing functions from plug-ins [7, 8]. 2.4.

Application server module description

Modules interact with external and internal services that implement various stages of working with data. The search for the required service is carried out by the service register (a software module that interacts with the ontological description of the service model), which is associated with the service model of the ontology. 2.5.

Description of the data processing module

The data processing module includes three stages.

Stage "Data preprocessing".

Preprocessing is aimed at noise reduction in order to improve visual perception for subsequent data processing. For some elements (in particular, for numerical data), the preprocessing stage is skipped (Figure 2)

Dictionaries, algorithms, text processing

Pre-processing algorithms

Raw data in text format

Stage "Normalization". Normalization refers to the process of converting incoming data to a single format. For example, for numerical data, normalization means the unification of the separators of the integer and fractional parts (Figure 3).

Noise-free text in string format Dictionaries, templates

Description of the data fusion module

This module implements data fusion. Merge refers to the process of data binding in accordance with an ontological model in order to ensure the integrity and consistency of data. The data integration diagram is shown in Figure 5. Integration

Related data Data in the format of

triplets "subject" "predicate" - "object" 3. Development of storage modules, diagnostics and microservice architecture 3.1. Description of the data warehouse element and the data storage object model

2. Model of diagnostic tools. 3. Data model. 4. Model of the institution.

3.2.

Description of the "role model" module

The role model (Figure 6) describes the roles involved in the processes. The role model includes the staff of the institution, subdivided into laboratory assistants, management personnel, and engineering workers.

Actor

Staff Full name

Branch

Diagnostics Date

Executor Engineering and technical

personnel

Qualification Technician

Engineer Manufacturing facility

Manufacturing facility Figure 6: Object model of institutional personnel Managing staff

Laboratory assistant Director

Foreman

Diagnostic model description The diagnostic model (Figure 7) is a hierarchy of diagnostic tools used to study the state of the equipment. Diagnostics is carried out by an employee of the institution.

Analysis Laboratory assistant Measuring control Measurement Figure 7: Object model of diagnostic tools 3.3.

Description of the data model

Data on the documents collected in the institution's IS are represented by text records and numerical values. Numerical and qualitative data are highlighted. The data type hierarchy includes subjective and objective data. Each type has a qualifying field.

Description of the institution model

The company's activities are carried out within the divisions of the institution. In addition to the production process carried out in the workshops, the laboratories carry out analyzes of the manufactured products. The model is shown in Figure 9.

Subdivision

Manager Manufacturing facility

Laboratory

Management (department)

Microservices bundle item description

The source of algorithms in the technological process of processing are services, which are accessed through the corresponding modules. Services are used both in the process of preprocessing data and in solving problems received from users. The results of processing on demand from users are also saved to the knowledge base in order to speed up the execution of similar tasks in the future. Thus, despite the close connection between the modules, their independence is preserved, and the module itself remains operational, provided the kernel and data stores are preserved [15, 16].

4. Conclusions

The modules considered earlier include a set of components intended for processing data within a module and interacting with each other through interfaces. Internal kernel modules interact by calling the API methods of the components, the interaction of the graphical interface with the kernel, and the kernel with the data source is carried out by sending GET RESTAPI requests or through the Web-socket (Figure 10).

"Customer" Components for generating queries and displaying processing results

Text preprocessing

Preparing the text

Measurement processing Preparation of measurements

Server

Preparation of measurements Storage Processed data Data source Raw data storage Preliminary text analysis Measurement analysis Measurement analysis Storage Semantic data Working with knowledge graphs Third party embedded services Text processing module External services External services

Figure 10: Modules of the developed software

5. Acknowledgements

The paper was prepared with the financial support of the Ministry of science and higher education of the Russian Federation in the course of the applied research «The comprehensive project to create high-tech production of software tools for automatic analysis of documentation on paper and digital media using semantic-cognitive technologies for cataloging poorly structured information» (unique identifier of the project 075-11-2019-055, Decree of the Government of the Russian Federation N 218, 09.04.2010).

6. References

[1] Saurabh Gupta and Anil Kumar Meena, A Practical Implementation of Automatic Document Analysis and Verification using Tesseract, International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), December 2018. doi:10.1109/CTEMS.2018.8769310. [2] S. Marinai, Introduction to Document Analysis and Recognition, in: S. Marinai, H. Fujisawa (eds), Machine Learning in Document Analysis and Recognition. Studies in Computational Intelligence, Springer, Berlin, Heidelberg, 2008, vol. 90, pp. 1-20. doi:10.1007/978-3-540-76280-5_1.