<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Physics:
Conference Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1246-0125</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.procs.2013.09.301</article-id>
      <title-group>
        <article-title>Hybrid  client‐server  implementation  and  microservice  architecture of automatic documentation analysis software </article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anastasia A. Dzyubanenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexey V. Rabin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Saint-Petersburg State University of Aerospace Instrumentation, SUAI</institution>
          ,
          <addr-line>67, Bolshaya Morskaia str., Saint- Petersburg, 190000</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>2022</volume>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>  An approach to the construction of an adaptive architecture for enterprise software has been developed in order to increase the efficiency of automated processing of documents using semantic and cognitive technologies. The proposed approach takes advantage of the existing methods of organizing the architecture of applied software. It is substantiated that the architecture of the developed software for automatic cataloging should have a hybrid clientserver implementation, including elements of modular and microservice architecture. It is shown that a significant reduction in the costs of cataloging, checking the completeness and inventory of documentation, as well as an increase in the quality of design are provided through the semantic analysis of documentation using a knowledge base that is updated automatically.</p>
      </abstract>
      <kwd-group>
        <kwd>1  Weakly structured information</kwd>
        <kwd>automatic analysis of documentation</kwd>
        <kwd>semantic and cognitive technologies</kwd>
        <kwd>data cataloging</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction </title>
      <p>3. Ability to localize the location of the error, which allows, with a good organization of modules,
to correct defects in one module, causes errors in another module.
4. Fast recompilation while fixing the error.
5. Ability to reuse modules.
6. Tools are provided to solve each processing task.
7. High resiliency due to redundancy of critical services.</p>
      <p>Availability of analytics tools, so it is easy to track dependencies between services [4].
The developed architecture includes elements from the hybrid architecture (Figure 1):
1. Element "Client".
2. Element "Application Server".
3. Element "Data warehouse".
4. Element "Complex of microservices".</p>
      <sec id="sec-1-1">
        <title>Client</title>
      </sec>
      <sec id="sec-1-2">
        <title>Client</title>
      </sec>
      <sec id="sec-1-3">
        <title>Client</title>
        <p>Application Server
Container</p>
      </sec>
      <sec id="sec-1-4">
        <title>Core</title>
      </sec>
      <sec id="sec-1-5">
        <title>Container</title>
      </sec>
      <sec id="sec-1-6">
        <title>Container</title>
      </sec>
      <sec id="sec-1-7">
        <title>Container</title>
        <p>Processing
module</p>
      </sec>
      <sec id="sec-1-8">
        <title>Analysis Module</title>
      </sec>
      <sec id="sec-1-9">
        <title>Data Integration Module</title>
      </sec>
      <sec id="sec-1-10">
        <title>Registry of services</title>
      </sec>
      <sec id="sec-1-11">
        <title>Internal</title>
        <p>Services</p>
      </sec>
      <sec id="sec-1-12">
        <title>External Services</title>
      </sec>
      <sec id="sec-1-13">
        <title>Database</title>
      </sec>
      <sec id="sec-1-14">
        <title>Knowledge base LOD</title>
        <p>2. Development of modules for hybrid client‐server implementation </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2.1. Element "Client" </title>
      <p>The client module interacts with personnel and provides data in various formats:
1. Tabular.
2. Graphic.
3. Graph.
4. Text [5].</p>
      <p>To solve visualization tasks, built-in components for working with knowledge graphs are used, data
is provided in each of the required formats with the ability to navigate the knowledge graph. Also, to
solve analytic tasks, components for working with multidimensional data are used. The graphical
interface is also designed for automated structuring of knowledge with the participation of subject matter
experts [6].
2.2.</p>
    </sec>
    <sec id="sec-3">
      <title>Application server element description </title>
      <p>The application server is built on a modular basis and includes:
1. The core.
2. Modules of data analysis.
3. Modules of data processing.</p>
      <p>4. Modules of data fusion.
2.3.</p>
    </sec>
    <sec id="sec-4">
      <title>Description core </title>
      <p>The flexible core is the central component of the software being developed; it interacts with the rest
of the modules and processes user requests from the graphical interface. The kernel, in the process of
functioning, processes user requests, interacts with data stores and provides the user with requested
samples or calls processing functions from plug-ins [7, 8].
2.4.</p>
    </sec>
    <sec id="sec-5">
      <title>Application server module description </title>
      <p>Modules interact with external and internal services that implement various stages of working with
data. The search for the required service is carried out by the service register (a software module that
interacts with the ontological description of the service model), which is associated with the service
model of the ontology.
2.5.</p>
    </sec>
    <sec id="sec-6">
      <title>Description of the data processing module </title>
      <p>The data processing module includes three stages.</p>
      <p>Stage "Data preprocessing".</p>
      <p>Preprocessing is aimed at noise reduction in order to improve visual perception for subsequent data
processing. For some elements (in particular, for numerical data), the preprocessing stage is skipped
(Figure 2)</p>
      <sec id="sec-6-1">
        <title>Dictionaries, algorithms, text processing</title>
        <p>Pre-processing
algorithms</p>
        <p>Raw data in text format</p>
      </sec>
      <sec id="sec-6-2">
        <title>Stage "Normalization". Normalization refers to the process of converting incoming data to a single format. For example, for numerical data, normalization means the unification of the separators of the integer and fractional parts (Figure 3).</title>
        <p>Noise-free text in string
format
Dictionaries, templates</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Description of the data fusion module </title>
      <p>This module implements data fusion. Merge refers to the process of data binding in accordance with
an ontological model in order to ensure the integrity and consistency of data. The data integration
diagram is shown in Figure 5.
Integration</p>
      <p>Related data
Data in the format of</p>
      <p>triplets "subject"
"predicate" - "object"
3. Development  of  storage  modules,  diagnostics  and  microservice 
architecture 
3.1. Description of the data warehouse element and the data storage object 
model </p>
      <sec id="sec-7-1">
        <title>2. Model of diagnostic tools. 3. Data model. 4. Model of the institution.</title>
        <p>3.2.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Description of the "role model" module </title>
      <p>The role model (Figure 6) describes the roles involved in the processes. The role model includes the
staff of the institution, subdivided into laboratory assistants, management personnel, and engineering
workers.</p>
      <p>Actor</p>
      <p>Staff
Full name</p>
      <p>Branch</p>
      <p>Diagnostics
Date</p>
      <p>Executor
Engineering and technical</p>
      <p>personnel</p>
      <p>Qualification
Technician</p>
      <p>Engineer
Manufacturing facility</p>
      <p>Manufacturing facility
Figure 6: Object model of institutional personnel 
Managing staff</p>
      <p>Laboratory assistant
Director</p>
      <p>Foreman</p>
      <sec id="sec-8-1">
        <title>Diagnostic model description The diagnostic model (Figure 7) is a hierarchy of diagnostic tools used to study the state of the equipment. Diagnostics is carried out by an employee of the institution.</title>
        <p>Analysis
Laboratory assistant
Measuring control
Measurement
Figure 7: Object model of diagnostic tools 
3.3.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Description of the data model </title>
      <p>Data on the documents collected in the institution's IS are represented by text records and numerical
values. Numerical and qualitative data are highlighted. The data type hierarchy includes subjective and
objective data. Each type has a qualifying field.</p>
    </sec>
    <sec id="sec-10">
      <title>Description of the institution model </title>
      <p>The company's activities are carried out within the divisions of the institution. In addition to the
production process carried out in the workshops, the laboratories carry out analyzes of the manufactured
products. The model is shown in Figure 9.</p>
      <p>Subdivision</p>
      <p>Manager
Manufacturing facility</p>
      <p>Laboratory</p>
      <p>Management (department)</p>
    </sec>
    <sec id="sec-11">
      <title>Microservices bundle item description </title>
      <p>The source of algorithms in the technological process of processing are services, which are accessed
through the corresponding modules. Services are used both in the process of preprocessing data and in
solving problems received from users. The results of processing on demand from users are also saved
to the knowledge base in order to speed up the execution of similar tasks in the future. Thus, despite the
close connection between the modules, their independence is preserved, and the module itself remains
operational, provided the kernel and data stores are preserved [15, 16].</p>
    </sec>
    <sec id="sec-12">
      <title>4. Conclusions </title>
      <p>The modules considered earlier include a set of components intended for processing data within a
module and interacting with each other through interfaces. Internal kernel modules interact by calling
the API methods of the components, the interaction of the graphical interface with the kernel, and the
kernel with the data source is carried out by sending GET RESTAPI requests or through the Web-socket
(Figure 10).</p>
      <p>"Customer"
Components for
generating
queries and
displaying
processing
results</p>
      <p>Text preprocessing</p>
      <p>Preparing the text</p>
      <sec id="sec-12-1">
        <title>Measurement processing Preparation of measurements</title>
        <p>Server</p>
      </sec>
      <sec id="sec-12-2">
        <title>Preparation of measurements</title>
      </sec>
      <sec id="sec-12-3">
        <title>Storage Processed data</title>
      </sec>
      <sec id="sec-12-4">
        <title>Data source Raw data storage</title>
      </sec>
      <sec id="sec-12-5">
        <title>Preliminary text analysis</title>
      </sec>
      <sec id="sec-12-6">
        <title>Measurement analysis</title>
      </sec>
      <sec id="sec-12-7">
        <title>Measurement analysis</title>
      </sec>
      <sec id="sec-12-8">
        <title>Storage Semantic data</title>
      </sec>
      <sec id="sec-12-9">
        <title>Working with knowledge graphs</title>
      </sec>
      <sec id="sec-12-10">
        <title>Third party embedded services</title>
      </sec>
      <sec id="sec-12-11">
        <title>Text processing module</title>
      </sec>
      <sec id="sec-12-12">
        <title>External services</title>
      </sec>
      <sec id="sec-12-13">
        <title>External services</title>
        <p>Figure 10: Modules of the developed software </p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>5. Acknowledgements </title>
      <p>The paper was prepared with the financial support of the Ministry of science and higher education
of the Russian Federation in the course of the applied research «The comprehensive project to create
high-tech production of software tools for automatic analysis of documentation on paper and digital
media using semantic-cognitive technologies for cataloging poorly structured information» (unique
identifier of the project 075-11-2019-055, Decree of the Government of the Russian Federation N 218,
09.04.2010).</p>
    </sec>
    <sec id="sec-14">
      <title>6. References </title>
      <p>[1] Saurabh Gupta and Anil Kumar Meena, A Practical Implementation of Automatic Document
Analysis and Verification using Tesseract, International Conference on Computational Techniques,
Electronics and Mechanical Systems (CTEMS), December 2018.
doi:10.1109/CTEMS.2018.8769310.
[2] S. Marinai, Introduction to Document Analysis and Recognition, in: S. Marinai, H. Fujisawa (eds),
Machine Learning in Document Analysis and Recognition. Studies in Computational Intelligence,
Springer, Berlin, Heidelberg, 2008, vol. 90, pp. 1-20. doi:10.1007/978-3-540-76280-5_1.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>