Real-Time Ukrainian Text Recognition and Voicing
Kateryna Tymoshenkoa, Victoria Vysotskaa, Oksana Kovtunb, Roman Holoshchuka and
Svitlana Holoshchuka
a
    Lviv Polytechnic National University, S. Bandera Street, 12, Lviv, 79013, Ukraine
b
    Vasyl' Stus Donetsk National University, 600-richchia Street, 21, Vinnytsia, 21021, Ukraine

                 Abstract
                 The main application task, solving which the project aims to, is to help people with visual
                 impairments and teach the correct pronunciation of words to people learning a Ukrainian
                 language as a foreign language. This problem is solved by developing software that will
                 recognise text in video mode. As a result, when you tap the screen, you will hear how the
                 selected text sounds.

                 Keywords 1
                 Text, real-time, recognition, image, text dubbing, text sounding, speech to text, text to speech,
                 text recognition, text recognition algorithm, recognised text, video mode, android operating
                 system, OCR algorithm, improved algorithm, optical character recognition, video recording
                 mode, user-friendly interface, information technology, textual content, text analysis, intelligent
                 system

1. Introduction

    The main problems when recognising a text are the following [1-5]:
        Programs are designed to identify a text from only one image;
        Some programs are complicated for the user to understand;
        A large number of characters causes a slowdown in text recognition;
        The reader needs to be selected by the user to be recognised;
        Some Ukrainian symbols are incorrectly read;
        Programs are not adapted for text recognition in video mode;
        Programs are not designed for visually impaired people;
        No sound of the text from an image.
    This study identifies the need for visually impaired people to read the text just by pointing the camera
and hearing it as a result [1-5]. The software can be used by visually impaired and ordinary users who
are learning the correct pronunciation of foreign words. The purpose of this work is to develop
algorithms for text recognition and sound in video mode and their implementation for people with visual
impairments and people who are learning a foreign language and correct pronunciation. The main idea
of the project is to elaborate software for the Android operating system to help visually impaired people
understand the text through its sound [5-7]:
        Text recognition technologies are promising but have some disadvantages for mobile phones
    (speed of text recognition and collection of all text within the phone screen);
        Ukrainian language in Android works incorrectly or does not exist at all.
    The paper aims to improve the text recognition algorithm based on existing analogues, namely
speed, by about 10-20%. To accomplish of the purpose and bring about the intended result, some
specific tasks have to be solved:

COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent Systems, April 22–23, 2021, Kharkiv, Ukraine
EMAIL: tymoshenko.k@outlook.com (K. Tymoshenko); victoria.a.vysotska@lpnu.ua (V. Vysotska); o.kovtun@donnu.edu.ua (O. Kovtun);
roman.o.holoshchuk@lpnu.ua (R. Holoshchuk); svitlana.l.holoshchuk@lpnu.ua (S. Holoshchuk)
ORCID: 0000-0001-9218-2363 (K. Tymoshenko); 0000-0001-6417-3689 (V. Vysotska); 0000-0002-9139-8987 (O. Kovtun); 0000-0002-
1811-3025 (R. Holoshchuk); 0000-0001-9621-9688 (S. Holoshchuk)
              ©️ 2021 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
   1. Implement text recognition in video mode;
   2. Implement text recognition in Ukrainian and English;
   3. Modify text recognition in Ukrainian and English for video recording mode;
   4. Produce the sound of the text when you tap the screen;
   5. The image is updated, and the found text is recognised at the number of frames (30-60 fps).
   We expect to receive the following scientific results aimed at solving the problem specified above:
       Implement text recognition in video mode;
       Produce the sound of the text when you tap the screen;
       The image is updated, and the found text is recognised at the number of frames (30-60 fps).
   The scientific novelty of our research is to develop an algorithm for recognising a text in video mode
by overlaying the image so that the software can identify and process it faster.

2. Related works and applications
   Consider the existing analogues to the created product, namely:
        ABBYYFineReader 10 HomeEdition [8-12];
        OcrCuneiform [13-16];
        CamScanner [17-21].
   ABBYY FineReader 10 Home is a well-implemented utility for recognising and scanning the text
[8-12]. The new version of the program copes well with transforming paper documents and various
images into electronic form. The program possesses an excellent user-friendly interface in Russian.
Once the processing of documents is completed, you can work with text data in any editor.
   ABBYYFineReader 10 HomeEdition features [8-12]:
        The text recognition program works with 179 languages;
        After processing the text, there is an option to save it, share by e-mail or publish it on the
   Internet;
        Built-in tools responsible for increasing the quality of photos and images;
        The utility supports Windows 7, as well as Windows 8 and 10;
        The ability to convert finished text into PDF, DOC, RTF, XLS and HTML;
        The format is preserved, as well as the style of the document. The subsequent ability to run
   through Microsoft Word, Outlook and Excel;
        The program supports the recognition of documents received from MFPs, scanners, digital
   cameras and even from a mobile phone;
        There is a Russian version of the implementation of a simple interface.
   Advantages [8-12]:
        ABBYY developer provides periodic updates to its scanning and text recognition utility;
        Processing of documents that were photographed on a mobile device;
        Availability of useful add-ons, such as ABBYYScreenshotReader and others;
        High accuracy of optical recognition of text information;
        Maintaining high image quality of documents;
        Any current Windows system is supported, excellent performance with Microsoft Office
   editors;
        Text data recognition program saves content in an edited format;
        The application processes paper documents with high quality.
   Disadvantages are that a beta version of the program has limited validity and reduced functionality
against the license background [8-12].
   OcrCuneiform quickly recognises both scanned and photographed texts [13-16]. In the process of
text recognition, a massive range of printed fonts is processed, while the original structure of electronic
documents stays preserved. The finished result is obtained in the OCR program can be sent for further
work in text editors. The utility's functionality is compared with the more famous program Abbyy
Finereader, except that it is free to download Cuneiform. OCRCuneiForm features are [13-16]:
        Saving the original font structure;
        Possibility of optical recognition of various text documents;
          Ability to batch process multiple documents;
          Free Cuneiform program provides fast and efficient recognition of characters and text;
          Conversion of electronic graphic documents, as well as paper documents into an editable form;
          Cognitive Technologies developer periodically updates the recognising algorithms of the
    utility;
          The user-friendly interface of OcrCuneiform program has menu sections in Russian;
          Convert various text, graphic files and photocopies of faxes to editable file formats for
    Microsoft Office;
          The latest version of Cuneiform 12 includes adaptive content recognition;
          Very decent optimisation performance with operating systems such as Windows 7 as well as
    XP, Vista;
          Work is done with already recognised text, which provides document analysis and convenient
    search of tables, images, and text blocks.
    We should also specify the advantages of the application [13-16]:
          The utility can be downloaded for texts scanning, recognition and analysis for free;
          High-quality recognition of text and graphic information;
          Program capabilities include processing documents obtained from laser and dot matrix printers;
          Convenient system of optical recognition of most existing printed fonts;
          Bringing any file structure into an editable format for well-known office programs and text
    editors;
          Complete analysis of scanned documents and a comfortable system for finding the desired
    table, any picture or text;
          Developers update their text recognition program by enlarging its options;
          Windows system including Vista and XP is supported;
          A high-level recognition system processes a poor photocopy of faxes.
    Disadvantages are possible if to slow down the Suneiform program in the process of document
recognition [13-16].
    CamScanner is an intelligent document management solution for small businesses, organisations,
government agencies, and schools [17-21]. It is ideal for those who want to sync, share and manage
multiple files on all devices. Its functionality includes [17-21]:
          A quick scan of documents. The phone's camera is used to scan (photograph) any paper
    document: checks, notes, invoices, discussion boards, business cards, certificates. There is a batch
    scan mode which saves more time.
          Scanning quality optimisation. Trimming and auto-improvement make CamScanner unique. It
    ensures the purity and sharpness of texts and images.
          Easy search of documents. Search for any files in seconds. In searching for some papers, the
    powerful tool "OCR for the search" recognises the text in PDF documents.
          Intelligent document management is on mobile devices. You can divide documents into groups,
    sort messages by Date, tags, view as Sheet/Tile, etc. There is an option to set passwords for classified
    documents to prevent information leakage.
          Registration and synchronisation of documents. It is possible to register with CamScanner and
    save/synchronise documents immediately. With logging in, users can edit and synchronise
    documents on smartphones, tablets, PCs, and cloud services.
    The analysis of programs and developments in the field helps us to define precise product
requirements. The application, which is developed, aims to provide the user with a user-friendly
interface that will automatically recognise text. After clicking on the frame on the screen, it voices an
item. Both visually impaired and ordinary people can use this application. The following functions and
algorithms are to be implemented in the program:
    1. An algorithm for text recognition during a dynamic frame change;
    2. Entering the recognised text into the frame;
    3. Sound of the recognised text.
    It is possible to define the basic requirements to the developed product having considered analogues.
    The target audience can be divided into two groups. The first group is ordinary people who use the
application for their personal everyday needs. The application might be used as an additional resource
in learning a foreign language. It will help the user understand the correct pronunciation of foreign
words and check the recognised text for grammatical errors. The second target group is people with
visual impairments. The application will help them listen to the text recognised by the program. The
software will be implemented for the Android operating system by using the Java programming
language. All you have to do is tap on the phone screen icon to launch the software. Hardware
requirements are 10 Mb of free disk space. System requirements include:
        Android 4.0 or higher operating system;
        Working language synthesis;
        A working camera on the phone.
    The user documentation includes:
        Product use guide, which describes all the functionality and step-by-step algorithms for
    working with the product;
        Basic examples of work with software.
    The user interface consists of a graphical interface and a text box. The graphical interface comprises
the camera in video mode. The text box tells the user how to hear the text and how to zoom the camera.
Because dynamic code processing takes many resources, a buffer is used to store the frames until the
reader is fully recognised. The software is produced on any modern mobile phone.
    The software must ensure the security of the user's input files and prevent access to the user's
intellectual property or a third-party enterprise. Having considered all the specifications of the
requirements and justification of the technology, we can draw the following conclusions:
    1. Functions and algorithms that will be implemented in the program are [22-24]:
            a. Algorithm for text recognition when dynamically changing frames in Ukrainian and
                 English and entering it in the structure;
            b. Algorithm for checking a recognised text for grammatical errors;
            c. Sound of the recognised text.
    2. Hardware requirements are 10 Mb of free disk space.
    3. System requirements:
            a. Android 4.0 or higher operating system;
            b. Speech synthesis applied;
            c. The working camera on the phone.
    4. Quality requirements [25-35]:
            a. Functionality.
                      i. Localisation – the software will support three languages (English, Russian and
                         Ukrainian).
                     ii. Reporting is used in case of errors, and the feedback will be sent to the
                         developers.
                    iii. Security of the software is a closed source, and the MD5 encryption algorithm
                         encrypts the information.
            b. Usability.
                      i. User-friendly interface.
                     ii. The minimum number of items is on the screen.
                    iii. The help system is built into the program.
            c. Reliability.
                      i. Errors in text recognition are about 2%.
                     ii. The software will be tested automatically and manually.
                    iii. The software will be ready to work and will work 24/7.
                    iv. In case of failure, the report will be sent to the developer's mail.
            d. Performance.
                      i. It takes 0.10 seconds to run the program. After selecting the recognition
                         language, the program will be entirely ready for use.
                     ii. It takes up to10 Mb of free disk space.
                    iii. The program requires 15 kWh.
            e. Supportability.
                        i. Autom atic tests are created to verify the program.
                       ii. The program runs on Android 4.0 or higher.
                      iii. Installation on a mobile phone is done by running .apk file.
                      iv. There is a guide inside the system for its proper use, which can be called up
                           through the menu.
   General requirements are the following [36-45]:
       Unification. The software is developed according to the waterfall methodology of software
   development. Since most users use the Android operating system, the software runs on it. Java
   programming language in IDEA Android Studio is used for development.
       Interoperability. The program interacts with the phone's camera and speech synthesis to voice
   the text. It is tested for camera performance, correct text recognition, and speech synthesis to have
   the proper text voicing. The video mode plays while the camera is used.
       Mobility. The software is adjustable with Android phones.
       Scalability. The program is designed for visually impaired people and / or people who want to
   learn the correct pronunciation of foreign words.
       Interaction with the user. The software has an easy-to-understand interface that contains a menu
   and a screen for the user to interact.

3. Problem statement

   Having studied modern approaches to the problem of mobile applications for visually impaired
people, we have set the following tasks for the program to be developed:
       Implement text recognition in video mode;
       Implement text recognition in Ukrainian and English;
       Develop sound to the text when you tap the screen;
       Updated the image and text with the number of frames (30-60 fps).

3.1.    Chart of a User Case

   The main actor is the user who needs to recognise a text and how it sounds (Fig. 1).


Figure 1: Chart of a user case

   The main successful scenario:
   1. The user opens the application and points the camera at the text;
   2. The program begins to recognise a text;
   3. After its full recognition, the text is highlighted by a frame;
   4. The user clicks on the smartphone screen;
   5. The system voices the recognised text.
   Alternative streams:
   1. Text in Ukrainian. The user selects the Ukrainian language in the application menu;
   2. The user needs to know the number of words in the text or check it for errors:
           a. The user selects the appropriate menu item;
           b. The system analyses the recognised text;
           c. The system does not display the result on the screen.

3.2.    Activity Diagram

   The activity diagram shows all possible alternatives that occur under certain sentinel conditions and
parallel processes that run in the system (Fig. 2).


Figure 2: Activity Diagram
3.3.    Class Diagram
   The following classes are selected in the system: ScreenFrame, Processor, CaptureActivity,
StartCamera, FramesOnScreen, CameraReview. Consider each separately (Fig. 3).
       CaptureActivity is the main class. All screen elements are initialised, all menu items are
   executed (English and Ukrainian languages, the number of words and their spelling are checked
   correctly). There are also checks for the camera and its ability to synthesise speech on a particular
   version of the Android operating system.


Figure 3: Class Diagram

       ScreenFrame is a class that is responsible for drawing borders around text, colour and style.
   This class also considers the coordinates of the beginning and end of the rectangle (upper left edge
   and lower right). Due to this, the text is clearly in the frame.
       Processor – this class is responsible for the operation of the image overlay algorithm. That is,
   it works directly with the ScreenFrame and FramesOnScreen classes for the algorithm to function
   correctly.
       StartCamera is a class responsible for processing the image with the camera and the focus,
   which the user can change manually. A buffer that stores the image is created.
        FramesOnScreen – this class implements the image overlay algorithm. It uses methods and
   changes from the ScreenFrame level, but only for dynamic frame changes. Thus, the recognised text
   is stored in the clipboard along with the image and frame.
        CameraReview – a class in which the image overlay algorithm fully shows the result on the
   screen, and the user can tap the screen and hear the text in it.

3.4.    Sequence Diagram

   It is enough for the user to open the program and point the camera at the text to start working. The
program will automatically begin to recognising it (Fig. 4). When the text is identified, it is surrounded
by a frame on the screen, and you only need to tap the screen to hear it. There are also built-in functions
for counting the number of words and grammatical errors in the text. To use these functions, you only
need to select the appropriate menu item – the result will be displayed on the screen.


Figure 4: Sequence Diagram


3.5.    Automaton Graph
    After starting the system, the system waits for the text to appear in the smartphone's camera. As soon
as the text seems, the program recognises it and circles it. The system then waits for the screen to click
and voice the recognised text (Fig. 5).
Figure 5: Automaton Graph

3.6.   Batch diagram
   The batch diagram shows the packets present in the system and their relations (Fig. 6):


Figure 6: Batch diagram
4. Methods of analysis
4.1. Description of the text recognition algorithm
    To understand how a text recognition algorithm works, we must first look at the types of these
algorithms and find out which one works best in real life [46-67].
    Of all the options, the ICDAR 2003 dataset has the highest character recognition performance.
Unlike the more classic Optical Character Recognition (OCR) problems, where characters tend to be
monotonous on fixed backgrounds, character recognition in an image scene is potentially more difficult
due to a large number of possible variations with background, lighting, textures, and fonts. Indeed,
significant efforts have made to create systems that integrate dozens of cleverly combined capabilities
and stages of image processing.
    Text recognition has aroused considerable interest in many areas of research. However, many
methods are used to detect text and recognise characters based on artificial intelligence systems.
    Functional teaching methods are currently the focus of research, especially for the visually impaired.
As a result, a wide range of algorithms is now available for studying features with data. Many of the
results obtained from the trait learning system have shown that higher performance in recognition tasks
can be achieved due to the greater weight of the representation. Consider the architecture used to study
the representation function and train the classifiers to detect and recognise the character system. The
basic architecture is closely related to the neural network, but it can be used to build extensive sets of
features with minimal configuration quickly, thanks to the learning method. Today, OCR technology
uses all previous text recognition algorithms. Text recognition is usually reduced to OCR. It includes a
computer system designed to transform images of printed text (usually obtained using a scanner) into
machine-edited text or for - pictures of characters in a standard encoding scheme. Represents OCR is
as a field of research in the field of artificial intelligence. People want to scan images and save them in
a document and access the text of that document in .txt or .docx format.
    Conversion is the first step in processing a scanned image. The scanned image is checked for
shadow, skew, tilt [68-72]. There are options for capturing images with the left or the right orientation.
The image is first converted to grayscale and then to binary. The result is an image suitable for further
processing. After pre-processing, the meaningless image is transmitted to the segmentation stage,
divided into individual characters. The binary image is checked for line spacing. The lines in the
paragraphs are scanned to cross the horizontal space relative to the background. The image histogram
is used to detect the width of the horizontal lines. The lines are then scanned vertically to cross the
vertical space. Here, histograms are used to determine the width of words. The words are then broken
down into characters that use character width calculations. Feature extraction occurs in the OCR
segmentation phase, where the symbol of individual images is considered and extracted from the feature
file. The classification is performed using the attributes extracted in the previous step, which
corresponds to each character. These features are analysed using a set of rules and assigned to different
classes. This classification is generalised for one type of font.
    Similarly, classification rules are written for other characters. This method is standard because it
extracts the shape of the characters and requires no training. Optical Character Recognition (OCR) aims
to classify optical patterns (often contained in icons) that correspond to alphanumeric or other symbols.
The OCR process involves several stages, including segmentation, feature use, and classification.
Virtually any standard OCR software can now be used to recognise text in segmented frames. Most
OCR software packages will have significant difficulty recognising text. Images of documents differ
from raw images because they contain mainly text with several graphic images.
    There are also many varieties of text recognition models.
    The principle of operation of a neural network is that having received a new image on the input layer
of neurons, the network responds with a pulse of a neuron. Because all neurons are named with letter
values, the neuron that responded to the image corresponds to the recognised symbol. Going deeper
into the terminology of networks, we can say that the neuron and the output also have many inputs.
These inputs describe the pixel value of the image. That is, if there is a 16x16 image, the network inputs
must be 256. Each information is perceived with a specific coefficient. As a result, a specific charge
accumulates on each neuron at the end of recognition, where the order will be more significant. That
neuron will emit a pulse. However, for the input coefficients to be set correctly, you must first train the
network. A separate training module does this. This module takes another image from the training
sample and transmits it to the network. This network analyses all positions of the black pixels. It aligns
the coefficients minimising the error of matching the gradient method, after which a specific neuron is
compared to this image. The algorithm of template methods is based on the comparison of the input
graphic image. The first step in the template method is to convert the scanned image into a bitmap. In
the recognition process, the patterns are sorted, and the difference between the image and the design is
calculated. A class whose designs have minor difference is the result of recognition. These methods are
divided into two categories: font-dependent and font-independent. Font-independent methods use
specific templates that are universal for all types of fonts. However, this approach reduces the likelihood
of correct recognition. Font-dependent algorithms are designed for only one type of font, which
improves the quality of their work, but they are entirely inoperable when using other fonts. Methods
have proposed recognising large volumes of text when some characters are guaranteed to be recognised
by font-independent methods.
    Then templates for font-dependent algorithms are built based on the recognised characters. With the
existing variety of printed products in the learning process, it is impossible to cover all fonts and
modifications. The advantages of this algorithm include ease of implementation, reliable operation in
the absence of interference, high accuracy of recognition of flawed characters, the speed with a small
alphabet. Disadvantages include a strong dependence on templates and the difficulty of selecting
optimal templates, the inability to recognise a font that is different from the one embedded in the system,
slow operation with a large number of obstacles, sensitivity to rotation, noise and distortion. Sign
methods are because the image is placed by the N-dimensional vector of signs. The formation of the
vector occurs during image analysis. This procedure is called symptom extraction. The standard for
each class is obtained by similar processing of the symbols of the educational sample. The advantages
of the method include ease of implementation, high final ability, resistance to changing the shape of the
characters, high speed. The disadvantages of this method include instability to various image defects,
loss of information about the symbol at the stage of obtaining features.

4.2.    Methods of working with pixels in image recognition

   The RAW format is used to work in the Android operating system with the camera, which is later
converted to JPEG format. RAW stores more information, but there is no compression, while JPEG is
a processed file that is stored in a container. Saving in JPEG can carry risks such as loss of clarity,
general blur. When the picture is taken, the light passes through the lens by the shortest path and hits
the sensor, filled with red, blue and green filters. It is called a Bayer filter, consisting of 25% red, 25%
blue and 50% green photocells (Fig. 7).


Figure 7: Bayer filter

    It should be understood that a RAW file is information that contains a set of digital colour values
that are assigned to each conditional pixel (Fig. 8). In the future, an array of values from the matrix
enters the image processor, and it is the process of compression into a JPEG file. Inside the processor,
algorithms that bring the image to the correct form (Fig. 9) function. During image processing and
development, the data is erased, and only the picture with all the noise and artefacts remains. According
to the JPEG scheme, image compression occurs in several stages. The first step is to convert the image
from RGB space to YUV space, which has brightness and colour characteristics. Further work with the
image will occur with this model, which allows obtaining a large degree of compression due to its
factors.


Figure 8: Images with different filters


Figure 9: Image Handling

   The Android operating system uses the YUV model when working with images. It consists of three
components: brightness (Y) and two colour components (U and V) [25-30].
   These components are defined based on RGB according to the formulas.
                         𝑌 = 𝐾𝑅 × 𝑅 + (1 − 𝐾𝑅 − 𝐾𝐵 ) × 𝐺 + 𝐾𝐵 × 𝐵,                          (1)
                                           𝑈 = 𝐵 − 𝑌,                                       (2)
                                           𝑉 = 𝑅 − 𝑌.                                       (3)
   If a reverse transformation is required, it is performed according to the formulas:
                                           𝑅 = 𝑉 + 𝑌,                                       (4)
                                             𝐾𝑅 × 𝑉 + 𝐾𝐵 × 𝑈                                (5)
                                   𝐺 =𝑌−                       ,
                                                1 − 𝐾𝑅 − 𝐾𝐵
                                           𝐵 = 𝑈 + 𝑌.                                       (6)
   The inverse conversion preserves the range of change of the RGB component, but the degree of
change of the components U and V more than in Y, so it is not convenient for encoding and data
transmission. Therefore, rationing is introduced. By definition, a component varies in interval
[−(1 − 𝐾𝐵 )𝐴, (1 − 𝐾𝐵 )𝐴, a V in the interval [−(1 − 𝐾𝐵 )𝐴, (1 − 𝐾𝐵 )𝐴, assuming that the RGB
components change in the range [0, 𝐴). To bring to the interval [−𝐴/2, 𝐴/2) components U and V
normalise according to the formula [25-30]:
                                     1 𝑈−𝑌             1 𝑅−𝑌                                          (7)
                             𝐸𝑈 =      ×          ,𝑉 = ×            .
                                     2 1 − 𝐾𝐵          2 1 − 𝐾𝑅
        If a reverse transformation is required [25-30], it is performed according to the formula:
                                  𝑅 = 𝑌 + 2 × 𝑉 × (1 − 𝐾𝑅 ),                                          (8)
                               2𝐾𝑅 × 𝑉 × (1 − 𝐾𝑅 ) + 2𝐾𝐵 × 𝑈 × (1 − 𝐾𝐵 )                              (9)
                     𝐺 =𝑌−                                                    ,
                                               1 − 𝐾𝑅 − 𝐾𝐵
                                  𝐵 = 𝑌 + 2 × 𝑈 × (1 − 𝐾𝐵 ).                                         (10)
   This method of component representation is used for analogue format [25-30].
   The YUV digital representation is a YCbCr format. For the digital data format, integer powers of
two are used. Most often 8-bit, 10-bit and so on. Because U and V may be harmful, they introduce an
offset of the coding levels. Spatial coding, such as YUYV or YUV422, is also used to thin out less
informative components. Consider the coefficients 𝐾𝑅 and 𝐾𝐵 (𝐾𝑅 = 0.299 and 𝐾𝐵 = 0.114) [25-30].The
same values are used to convert colour space to YPbPr and JPEG. The second stage when working with
the image is the division into square sections of 8x8 pixels. After that, the transformation is performed
over each plot. During the conversion, the analysis of each block is performed. Its decomposition into
colour components and the analysis of the frequency of occurrence of each colour (Fig. 10).


Figure 10: The principle of pixel formation

   When compressed, the amount of information always depends on the image quality. The parts are
entirely erased at high compression levels, and the image itself turns grey (Fig. 11).


Figure 11: Converting the image into black and white

   At medium and low compression levels, the file stores approximate colour information. All these
indicators depend on the degree of compression. JPG uses the Fourier series to conserve and rejects
members of a higher order series at high compression ratios. The causes one problem – if the image is
saved in JPEG format, it is impossible to restore it to the last pixel. The JPEG format is called the
"format with loss", and therefore, the images from it become worse.
   Colour and brightness information is encoded so that only the differences between adjacent blocks
are preserved. As a result, strings of numbers represent the blocks. Since the blocks contain many zeros
due to processing, the last stage of coding gives good results.

4.3.    The principle of speech synthesis algorithm
    To use text-sound, you need to analyse which method is best for software development.
    The sound of the text is divided into types:
        Parametric synthesis;
        Compilation synthesis;
        Synthesis according to the rules (according to the printed text).
    Parametric synthesis is used for any language, but it cannot be used immediately on the finished
text. This method only works with a limited amount of text, so the recognition quality is relatively high.
A tone is used for vowel sounds, and a noise generator is used for consonant sounds. However, it is
used when recording music. Consider another method, namely compilation synthesis. This method is
formed to recognise the text on the finished collection of characters. The recognition element itself must
be at least a word. In general, the number of factors is limited to a few hundred words. This method is
used in everyday life - in help desk or machines that are equipped with a voice response system. Another
method is the complete synthesis that does not use human language but uses vocal linguistic algorithms.
It is divided into two approaches. The first is a formant synthesis of language according to the rules,
and the second is an articulatory approach. Formant synthesis is based on the frequency resonances of
the speech speaker system. The algorithm simulates the operation of human speech, which works as a
set of resonators. This universal and promising technology is only improving over the years. The
articulatory method works as follows - adding to the model the phonetic features of individual sounds.
    There is a rule of language recognition technology that uses recorded segments of natural language.
They are divided into the following types of synthesis: allophonic and diphonic. For the diphone
method, the essential elements are used in the combination of phonemes, and for the allophone method,
the left and right phonemes are combined. In this case, different types of contexts are integrated into
classes according to the degree of acoustic proximity. The main advantages of such methods are that
they make it possible to synthesise text according to an already specified pattern from libraries. The
disadvantage is only one, they do not reproduce authentic natural sounds. It is also challenging to control
speech characteristics, namely the tone, speed, context of the phrase. Despite the high level in this area,
the algorithm developers still have difficulty creating a perfect synthesis of language.

4.4.    The structure of the program text recognition and sound
   Since the image is not static but changeable, its speed is essential to us. The frame rate on the phone
ranges from 30 to 60 frames per second. the program will circle the found text with a structure inside
the text, to achieve maximum text recognition performance (Fig. 12).


Figure 12: Text recognition
    The frame and the text in it will be updated in about 15-30 frames to reduce the load on the phone.
Thus, the user will not notice much difference. If the characters are not decrypted, the system will skip
them, and they will not be visible on the screen. The system will automatically delete the old text frame
in 15-30 frames and immediately create a new one with new data. Once the text is found, the user will
click on it and hear the text within the frame (Fig. 13).


Figure 13: Tap the screen

    The algorithm works so that it takes, for example, 15 frames. The text recognition algorithm is
started, and each frame is recognised. These 15 frames are superimposed on each other, and when the
text becomes recognised – the user can see the result on his phone. However, it should be noted that
natural and technical factors affect its speed in text recognition. Therefore, the number of frames will
change. The research will be performed when the brightness changes and the number of characters on
the screen changes. The results should show how they affect the operation of the algorithm. Faster
processing is achieved because the OCR algorithm checks each image, and at the slightest movement
of the phone, the frame changes and the text that needs to be recognised is lost (Fig. 14).


Figure 14: The principle of standard OCR when changing the frame

   Text recognition requires a more explicit focus on the phone. The improved algorithm makes
overlaying images, and thus there is no loss of frames. That is, the algorithm will more quickly recognise
the text on the phone screen.

4.5.    Methods of text recognition research

    The first stage of scientific research is a complete analysis of text recognition in video mode. The
study is carried out using various sources of information located in Ukraine and abroad. The primary
source of data is the World Wide Web, which searches for articles on this topic. Based on the analysis,
the task to the given theme is formed, the choice of research methods, the basic parameters for
estimating the quality of work of the software and algorithm, and the expected economic effect.
    The second stage of scientific research is the solution to the tasks set at the first stage. The software
product is tested in actual conditions. The leading indicators for testing are the change in brightness and
the number of characters on the phone screen. The study takes place in the form of 20 tests. Test results
are shown in the form of tables and diagrams for a better understanding of the information.
    The last stage is the analysis of the obtained results and their design. They are demonstrated as tables
and charts. The results are compared with analogues, and the quality of the software is summarised.
    The input for the research is the brightness and the number of characters on the screen. At the
entrance to the software product, there is an image that is updated.
    The source data of the software product is recognised text and graphically drawn frames of this text
so that the user can click and the text itself is voiced as a result (Fig. 15).


Figure 15: Input and output data of the software

   Having considered the methods of text recognition and sound, as well as the principle of the
program, we can draw the following conclusions:
       Android Studio IDE and Java programming language are used for software development;
       Language synthesis, which is built into the Android operating system, is used to voice the
   selected text;
       Text recognition is performed by superimposing images on each other, which speeds up the
   algorithm;
       Text recognition requires several steps to ruffle the image, namely from POW to JPEG and
   YUV, for better text recognition.

5. Results
   The software is developed on Android Studio using the Java programming language. You must first
upgrade your Google Repository to use the Mobile Vision Text API. To get started, you need to change
the dependency block to work with Google services, as the main character library is located on Google's
servers. Nevertheless, after connecting, the need for the Internet disappears, as everything you need will
load [68-72].
Create an XML file that initialises the screen elements for the program's correct operation [70].


   Create a TextRecognizer. This detector object processes images and determines what text appears
inside them. After initialisation, TextRecognizer is used to detect text in all types of images [68, 72].


    TextRecognizer is ready to go. If your device does not have enough memory or Google Play Services
cannot load OCR dependencies, the TextRecognizer object will not work [68]. Before using it to
recognise the text, you need to check if it is ready [68]. Add this check to create CameraSource after
initialising TextRecognizer [68].


   Now that you have verified that the TextRecognizer is ready, you can use it to recognise individual
frames [68]. Since the main functionality is text recognition in camera mode, we create CameraSource,
which is pre-configured to control the camera. You need to set high-quality shooting and turn on
autofocus to cope with the task of recognising small text. If users look at large blocks of text, such as
signs, they can use lower quality, and then frame processing will be faster [68].
    The application can now detect text in individual frames using the detection method in
TextRecognizer. Therefore, you can find the text, for example, in a photo. However, to read the text
directly during video recording, you need to implement a Processor that will process the text as soon as
it appears on the screen. Go to the Processor class to implement the Detector.Processor interface:


    To implement this interface, you need to redefine two methods. The first receive Detections and
receive a TextBlocks input from TextRecognizer as detected [68]. The second, release, is used to free
resources when destroying TextRecognizer. In this case, you just need to clear the graphics canvas,
which will delete all OcrGraphic objects. Get TextBlocks and create objects ScreenFrame for each text
block detected by the processor. Now that the processor is ready, you need to configure textRecognizer
to use it [68].


   Implement the draw method in ScreenFrame. You need to check whether the image has text, convert
the coordinates of its boundaries into a frame, and then draw both the structure and the text [68, 72].
   Now the text from the camera is converted into structured lines, and these lines are displayed on the
screen. Using the TextToSpeech API, built into Android, and the method in ScreenFrame, you can teach
the application to speak aloud when you click on the text [68]. First, the process is implemented in
ScreenFrame. You need to check that the x and y coordinates are within the displayed text box [72].


   The method of image overlay is used to increase the speed of text recognition. It calculates the
coordinates of the text and the text itself [69-70].
   TextToSpeech also depends on the recognition language. You can change the language based on the
recognised text [68]. Language recognition is not built into the Mobile Vision Text API, so we use the
Google Translate API. You can use the user's device language as the language for text recognition.
Next, you need to check the correctness of the text and its quantity, which is in the frame. After that, a
menu is created where the user selects the language and checks what he needs.


   Create a method of tapping the screen [69].


6. Experiments and discussion
6.1. The principle of operation of the developed algorithm and performance
    tests
   Java software and tools for text recognition and sound in video mode are used to develop the
software. The image is not static but changeable; an important aspect is its speed. The frame rate on the
phone ranges from 30 to 60 frames per second. The program circles the found text with a frame to
achieve maximum text recognition performance. The frame and the text in it are updated in about 15-
30 frames to reduce the load on the phone. If the program finds characters that will not be decrypted,
the system skips them, and they will not be visible on the screen. The system automatically removes
the old text frame after 15-30 frames and immediately creates a new one with new data.
   After the text is found, the user clicks on it and hears the text within the frame (Fig.16).


Figure 16: Tapping the screen

    The algorithm works as follows: it takes, for example, 15 frames, the text recognition algorithm
starts, and each frame recognises, then these 15 frames are superimposed on each other and when the
text becomes recognised – the user can see the result on his phone. However, it should note that natural
and technical factors affect its speed in text recognition. Therefore, the number of frames will change.
A study was conducted when adjusting the brightness and changing the number of characters on the
screen. The results showed how they affect the operation of the algorithm. Achieving faster processing
is performed because the OCR algorithm checks each image, and at the slightest movement of the
phone, the frame changes and therefore, the text that needs to be recognised is lost. That is, text
recognition requires a more explicit focus on the phone.
    The improved algorithm makes overlaying images, and thus there is no loss of frames. That is, the
algorithm quickly recognises the text on the phone screen (Fig.17).


Figure 17: The principle of operation of the improved algorithm when overlaying images

    A file with an .XML extension is created to recognise the text in English and Ukrainian. It contains
the identification of Ukrainian symbols, namely uppercase and lowercase letters. Each character is
given its name, depending on the surface. The next step is to connect to the class to recognise the text
in the image. Text recognition takes place as follows. The symbol from the screen and compared with
the sign from the XML file is taken (Fig. 15). The picture character is converted for a specific standard
to resemble characters from an image and an XML file. Namely, the font and its size is change. In our
case, it is Courier New 10pt (Fig.18).


                                 (B == A) ≠ B no
        B                        (B == B) = B yes


Figure 18: The character in the image to a specific standard
    The characters in the image and the file are then compared. To do this, use a script to resemble
characters from OCR technology. Once a character is located, it is stored in a temporary buffer, which
consistently accumulates symbols. After recognising all the text is completed, the result is displayed on
the screen as a frame, inside which is the recognised text. In the future, the temporary buffer is freed
for the reader to recognise when the images are changed. And the new image is processed in a new way.
To do this, the main class joins a class that works with an array of frames (Fig. 19). The result is the
recognition of Ukrainian text when changing frames in video mode.


Figure 19: Text recognition

6.2.    Software testing
    The received software is tested to verify the quality and compliance with the requirements specified
in the requirements specification. The quality of the system is checked during the testing process.
Functional testing is chosen to test the system, as the software must function properly under certain
conditions. The following tests and test data sets are created according to Table 1-2 to verify the quality
of the software developed.

Table 1
Functional test cases
                               Options for use                       Test cases    Test data
                       Text recognition in video mode                    20            5
                           Working with the screen                       20            5
              Choice of language for text recognition and sound          20            5
                                    Total                                60           15

Table 2
General description: Audit TestCase id
         TestCase id:                 Steps:                           ExpectedResults:
   1 - Text recognition in        Run program              The text should appear on the screen and
         video mode                                                   change dynamically.
 2 - Work with the screen 1. Run the program              The recognised text will be displayed on the
                            2. Wait until the text is     screen and framed. The frame will become
                            recognised;                                  active for use.
                            3. Click on the frame in
                            which the text is
                            located.
   3 - Choice of language   1. Run the program            Select the language from the menu and wait
  for text recognition and 2. Select Ukrainian or         until the text is recognised.
            sound           English from the menu.
    Test result - Passed successfully.
    Functional testing is successfully taken, and compliance with the requirements is confirmed. The
following indicators are used to determine the success of the project:
      Developed tests cover requirement specifications;
      Various options of input data are tested.
    So, all test patterns are successfully performed with a positive result.

6.3.    Research on the accuracy of text recognition
    Any text that is not perfect will cause difficulties even in the most advanced OCR system, which
will significantly reduce the accuracy of insufficient image processing. For example, the recognition
accuracy may be reduced by 20% when characters are separated due to poor image quality or if several
characters merge due to a blurred or dark background.
    First, a high-quality image is required for successful text recognition. A good example is a text that
is printed on a laser printer, magazine, or book page. It is well recognisable in newspapers, printed on
high-quality paper. If you are viewing an image from a camera or video, you need clear focus and the
right conditions. Character boundaries should be clear, the font should be almost black, and the
background should be close to white. As a rule, text recognition accuracy with favourable conditions
exceeds 98%, i.e. one error per 50 characters or less. However, if one of the conditions is violated, the
accuracy of recognition may fall by 15-20%.
    Difficulties in recognising text occur when changing the image's contrast, namely a yellowed
background, pale or faded paint on the reader. In addition, an important aspect is the recognition of the
text when the light changes. As the brightness of the image decreases, the algorithm loses the accuracy
of text recognition. According to the results of the study, the accuracy dropped by about 10-15%. The
experiments are performed only with a class that recognised text from a single image. The next step
was to study the accuracy of text recognition in video. The factors that influenced text recognition
remained the same. Still, another factor in the deterioration inaccuracy was the dynamic change of the
image, i.e. the array of frames that changed all the time. Recognition accuracy dropped to 40%.
    The research is carried out in several stages. Because the image is updated dynamically and under
different natural conditions, or rather the brightness and the different number of characters in the text
that will be recognised, several graphs and tables will be built to understand the algorithm better.
    The first stage conducted 20 tests that will show how often the text is recognised and the user sees
it on the phone screen when the brightness changes, according to Table 3.

Table 3
Testing text recognition when the brightness changes
       N of tests Nb Frames: Illumination, lux N of tests         Nb Frames: Illumination, lux
            1           12             980           11               18           600
            2           14             900           12               19           580
            3           14             860           13               20           550
            4           14             840.          14               19           530
            5           15             800           15               20           500
            6           17             780           16               19           490
            7           16             720           17               18           470
            8           19             700           18               19           430
            9           17             650           19               17           400
           10           18             630           20               19           350

   The graph clearly shows that the darker the image on the phone, the longer it takes to recognise the
text on it. On average, about 18 frames are obtained (Fig. 20). The examples show how the program
recognises text when the brightness changes. The darkest image is at the top left, and the lightest from
the sample is at the bottom right (Fig. 21).
                                       Image testing when the light changes
              25
              20
 Nb Frames:


              15
              10
               5
               0
                   1   2   3   4   5     6   7   8   9   10   11   12    13   14   15   16   17   18   19   20
                                                N of tests
Figure 20: Testing the image when the brightness changes (Number of frames as the legend element)


Figure 21: Examples of text recognition when the light changes

   Next, a study of text recognition with different numbers of characters is carried out. For a more
accurate analysis, the text is taken with a difference of one character by Table 4.

Table 4
Testing when changing the number of characters
 N of tests Nb Frames: Number of characters               N of tests    Nb Frames: Number of characters
      1         18                 3                         11             23             13
      2         20                 4                         12             25             14
      3         21                 5                         13             28             15
      4         21                 6                         14             30             16
      5         22                 7                         15             31             17
      6         22                 8                         16             32             18
      7         22                 9                         17             30             19
      8         23                10                         18             34             20
      9         26                11                         19             33             21
     10         24                12                         20             32             22
   Based on the results, we can say that the execution of the text recognition algorithm slows down
with the increasing number of characters and the screen (Fig. 22). The examples show precisely how
the program recognises the text and the boundaries for the reader (Fig. 23).

                            Testing the image when recognizing different number of characters
               40

               30
  Nb Frames:


               20

               10

               0
                    1   2   3   4   5    6   7   8   9   10 11 12 13 14 15 16 17 18 19 20
                                         N of tests
Figure 22: Test the image when the number of characters changes (Series "Number of frames" as
Element of the legend)


Figure 23: Examples of text recognition when changing the number of characters

   The latest study is that the algorithm with changing the brightness and increasing the number of
characters has been tested. It is 20 tests that are performed and which showed the following results
according to Table 5. The graph clearly shows that the text recognition algorithm is strongly inhibited
because many factors affect its operation (Fig. 24).

Table 5
Testing of both parameters
                 N of tests Nb Frames: Number of characters             Illumination, lux
                     1          18             3                              980
                     2          20             4                              900
                     3          21             5                              860
                     4          22             6                              840.
                     5          23             7                              800
                     6          22             8                              780
                     7          21             9                              720
                                 8               23                10                       700
                                 9               25                11                       650
                                 10              24                12                       630
                                 11              23                13                       600
                                 12              24                14                       580
                                 13              29                15                       550
                                 14              30                16                       530
                                 15              32                17                       500
                                 16              32                18                       490
                                 17              31                19                       470
                                 18              34                20                       430
                                 19              34                21                       400
                                 20              35                22                       350

                          Testing images when the brightness and number of characters change
                 40

                 30
    Nb Frames:


                 20

                 10

                  0
                      1      2   3    4      5   6    7   8   9   10 11 12 13 14 15 16 17 18 19 20
                                           N of tests
Figure 24: Test the image when changing the number of characters and changing the brightness
(Series "Number of frames" as Element of the legend)

   The research revealed that natural and technical conditions affect the algorithm's operation and its
speed in text recognition. The darker the image or more characters, the more frames are needed to
process it. To compare the quality and speed of the algorithm, we take the well-known algorithm OCR.
A study of the operation of this algorithm is conducted on the same terms as when testing your product.
Graphs are formed for a clearer vision of the research results. Twenty tests were also performed. The
chart clearly shows that the OCR algorithm needs more time and frames to recognise the text. Based on
the results, on average, you need one frame more than your development (Fig. 25).

                                          Testing the image when the brightness changes
               25

               20

               15
  Nb Frames:


               10

                 5

                 0
                      1     2    3    4     5    6    7   8   9   10    11   12   13   14   15    16   17   18   19   20
                                          N of tests
Figure 25: Testing the image when the brightness changes (Blue colour is improved algorithm, the
orange colour is OCR algorithm)

   Testing is also performed when changing the number of characters for both algorithms. The graph
shows that text recognition when changing the number of characters is significantly different from tests
when adjusting the brightness. The OCR algorithm takes more time and frames to recognise the text
compared to its development. Here, the deviation is 1.5-2 frames (Fig. 26). The last test was testing
algorithms based on changes in the number of frames and changes in brightness.

                               Testing the image when recognizing different number of characters
               40

               30
 Nb Frames:


               20

               10

                   0
                           1     2   3   4   5   6    7   8   9      10   11   12   13   14   15   16   17   18    19   20
                                             N of tests
Figure 26: Testing the image when changing the number of characters (Blue colour is improved
algorithm, the orange colour is OCR algorithm)

    The graph shows that no notable differences are visible, but the difference between the graphs is still
observed (Fig. 27). We can conclude that the improved algorithm increases productivity by 4-5% and
if, more precisely, the text recognition is 1-2 frames less than the OCR algorithm.

                               Testing images when the brightness and number of characters change
              40

              30
 Nb Frames:


              20

              10

              0
                       1        2    3   4   5   6   7    8   9     10    11   12   13   14   15   16   17   18    19   20
                                              N of tests
Figure 27: Testing the image when changing the number of characters and changing the brightness
(Blue colour is improved algorithm; the orange colour is OCR algorithm)

    Faster processing is achieved because the OCR algorithm checks each image. The phone, the frame
changes at the slightest movement. And therefore, the text is lost that needs to be recognised. It also
looks at how the program responds to different sizes and fonts. Fig. 28 shows how the program handles
different fonts. Fig. 29 shows an example of text recognition at different sizes.


Figure 28: Text recognition in different fonts                    Figure 29: Text recognition at different sizes

       We can conclude that the program works well with different types of text – different fonts and sizes.
7. Conclusion
    The developed software has shown an improved high-speed algorithm for recognising printed texts
and their sounding to implement them in mobile devices to help people with visual impairments and /
or people who are learning the correct pronunciation of foreign words. The result of the research project:
          Implementation of a text recognition algorithm in video recording mode for people with visual
    impairments;
          Algorithm for recognition in Ukrainian and English in video recording mode;
          Implementation of the algorithm for sounding the text when you click on the screen;
          Image update and recognition of found text occurs at the number of frames (30-60 fps);
          Support for Android 4.0+;
          Implementation of highlighting the boundaries of the found text;
          Software for the Android operating system for the visually impaired and / or people learning
    the correct pronunciation of foreign words.
    Studies have shown that the improved algorithm speeds up text recognition by 4-5%, which helps
to recognise text better, but the effectiveness depends on external factors. A study is conducted on the
accuracy of text recognition. The results showed that recognition accuracy depends on external factors,
which impairs the recognition by about 40%. New functionality and features of the software product
will be added for further research of the software product:
          Updating the user interface for easier use of the software;
          Further optimisation of the text recognition algorithm for video recording mode;
          Researching for the algorithm to assess the behaviour of the program when changing the screen
    tilt (from 0 to 90 degrees);
          Analysing with handwritten texts to analyse how the program will interact with them;
          Adding new languages for text recognition and sound.

8. References
[1] S., Deshpande, R. Shriram, Real time text detection and recognition on hand held objects to assist
     blind people, in: IEEE 2016 International Conference on Automatic Control and Dynamic
     Optimisation Techniques (ICACDOT), 2016, pp. 1020-1024.
[2] Z. Liu, Y. Li, F. Ren, W. L. Goh, H. Yu, Squeezedtext: A real-time scene text recognition by binary
     convolutional encoder-decoder network, in: Proceedings of the AAAI Conference on Artificial
     Intelligence, volume 32(1), 2018.
[3] Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, S. Zhou, Aon: Towards arbitrarily-oriented text recognition,
     in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp.
     5571-5579.
[4] H. Li, P. Wang, C. Shen, G. Zhang, Show, attend and read: A simple and strong baseline for
     irregular text recognition, in: Proceedings of the AAAI Conference on Artificial Intelligence,
     volume 33(01), 2019, pp. 8610-8617.
[5] F. Bai, Z. Cheng, Y. Niu, S. Pu, S. Zhou, Edit probability for scene text recognition, in: Proceedings
     of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1508-1516.
[6] N. Shakhovska, O. Basystiuk, K. Shakhovska, Development of the Speech-to-Text Chatbot
     Interface Based on Google API, Vol-2386 of CEUR Workshop Proceedings, 2019, pp. 212-221.
[7] N. Grabar, T. Hamon, Automatic Detection of Temporal Information in Ukrainian General-
     language Texts, volume Vol-2136 of CEUR Workshop Proceedings, 2018, pp. 1-10.
[8] ABBYYFineReader 10 HomeEdition, 2020. URL: https://www.abbyy.com.
[9] A. P. Tafti, A. Baghaie, M. Assefi, H. R. Arabnia, Z. Yu, P. Peissig, OCR as a service: an
     experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym, in:
     International Symposium on Visual Computing, Springer, Cham, 2016, pp. 735-746.
[10] M. Koistinen, K. Kettunen, J. Kervinen, How to Improve Optical Character Recognition of
     Historical Finnish Newspapers Using Open Source Tesseract OCR Engine–Final Notes on
     Development and Evaluation, in: Language and Technology Conference, 2017, pp. 17-30.
[11] F. Alkhateeb, I. A. Doush, A. Albsoul, Arabic optical character recognition software: A review,
     volume 27(4) of Pattern Recognition and Image Analysis, 2017, pp. 763-776.
[12] R. Giritharan, A. G. Ramakrishnan, Color Components Clustering and Power Law Transform for
     the Effective Implementation of Character Recognition in Color Images, in: Progress in Intelligent
     Computing Techniques: Theory, Practice, and Applications, 2018, pp. 131-139.
[13] OcrCuneiform, 2020. URL: https://softcatalog.info/ru/programmy/ocr-cuneiform.
[14] K. Yamauchi, H. Yamamoto, W. Mori, Building a handwritten cuneiform character imageset, in:
     Proceedings of the Eleventh Int. Conference on Language Resources and Evaluation, 2018.
[15] L. Rothacker, D. Fisseler, G. G. Müller, F. Weichert, G. A. Fink, Retrieving cuneiform structures
     in a segmentation-free word spotting framework, in: Proceedings of the 3rd International
     Workshop on Historical Document Imaging and Processing, 2015, pp. 129-136.
[16] S. M. H. Mousavi, V. Lyashenko, Extracting old persian cuneiform font out of noisy images
     (handwritten or inscription), in: IEEE 10th Iranian Conference on Machine Vision and Image
     Processing (MVIP), 2017, pp. 241-246.
[17] CamScanner, 2020. URL: https://www.camscanner.com.
[18] C. P. Chandrika, J. S. Kallimani, Polarity Identification for Handwritten Text in Multilingual
     Documents Using Open Source Optical Character Recognition Tools, volume 17(9-10) of Journal
     of Computational and Theoretical Nanoscience, 2020, pp. 4045-4049.
[19] O. Bamasag, M. Tayeb, M. Alsaggaf, F. Shams, Nateq Reading Arabic Text for Visually Impaired
     People, in: Int. Conf. on Universal Access in Human-Computer Interaction, 2018, pp. 311-326.
[20] A. Maulidiyah, ,Translation Strategies of Noun Phrases with Derived Noun as Head in Academic
     Text, Doctoral dissertation, UNIVERSITAS 17 AGUSTUS 1945, 2018.
[21] N. Mannov, C. M. Lüders, A. Kaznin, ReqVision: Digitising Your Analog Notes into Readable
     and Editable Data, in: IEEE 4th International Workshop on Requirements Engineering for Self-
     Adaptive, Collaborative, and Cyber Physical Systems (RESACS), 2018, pp. 20-23.
[22] V. Lytvynenko, N. Savina, J. Krejci, M. Voronenko, M. Yakobchuk, O. Kryvoruchko. Bayesian
     Networks' Development Based on Noisy-MAX Nodes for Modeling Investment Processes in
     Transport, volume Vol-2386 of CEUR Workshop Proceedings, 2019, pp. 1-10.
[23] V. Lytvyn, V. Vysotska, P. Pukach, І. Bobyk, D. Uhryn, Development of a method for the
     recognition of author's style in the Ukrainian language texts based on linguometry, stylemetry and
     glottochronology, volume 4(2-88) of Eastern-European Journal of Enterprise Technologies, 2017,
     pp. 10-19.
[24] C. Shu, D. Dosyn, V. Lytvyn, V. Vysotska, A. Sachenko, S. Jun, Building of the Predicate
     Recognition System for the NLP Ontology Learning Module, in: International Conference on
     Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications,
     IDAACS, 2, 2019, pp, 802-808.
[25] B. Rusyn, O. Lutsyk, R. Kosarevych, Y. Varetsky, Automated Recognition of Numeric Display
     Based on Deep Learning, in: Proceedings of 3rd International Conference on Advanced
     Information and Communications Technologies, AICT, 2019, pp. 244-247.
[26] Yu.M. Furgala, B.P. Rusyn, Peculiarities of melin transform application to symbol recognition, in:
     the 14th International Conference on Advanced Trends in Radioelectronics, Telecommunications
     and Computer Engineering, TCSET, 2018, pp. 251-254.
[27] B.E. Kapustiy, B.P. Rusyn, V.A. Tayanov, Peculiarities of application of statistical detection
     criteria for problems of pattern recognition, volume 37(2) of Journal of Automatioin and
     Inrormation Science, 2005, pp. 30-36.
[28] B.O. Kapustiy, B.P. Rusyn, V.A. Tayanov, A new approach to determination of correct recognition
     probability of set objects, volume 2, of Upravlyaushchie Sistemy i Mashiny, 2005, pp. 8-12.
[29] O. Veres, I. Rishnyak, H. Rishniak, Application of Methods of Machine Learning for the
     Recognition of Mathematical Expressions, volume Vol-2362 of CEUR Workshop Proceedings,
     2019, pp. 378-389.
[30] V. Lytvyn, I. Peleshchak, R. Peleshchak, Increase the speed of detection and recognition of
     computer attacks in combined diagonalised neural networks, in: International Scientific-Practical
     Conference Problems of Infocommunications Science and Technology, 2018, pp. 152-155.
[31] P. Zhezhnych, O. Markiv, Recognition of tourism documentation fragments from web-page posts,
     in: 14th International Conference on Advanced Trends in Radioelectronics, Telecommunications
     and Computer Engineering, TCSET, 2018, pp. 948-951.
[32] P. Zhezhnych, O. Markiv, Linguistic comparison quality evaluation of web-site content with
     tourism documentation objects, volume 689 of Advances in Intelligent Systems and Computing,
     2018, pp. 656-667.
[33] P. Zhezhnych, O. Markiv, A linguistic method of web-site content comparison with tourism
     documentation objects, in: International Scientific and Technical Conference on Computer
     Sciences and Information Technologies, CSIT, 2017, pp. 340-343.
[34] N. Melnykova, O. Markiv, Semantic approach to personalisation of medical data, in: Computer
     Sciences and Information Technologies, CSIT, 2016, pp. 59-61.
[35] R. Martsyshyn, M. Medykovskyy, L. Sikora, (...), N. Lysa, B. Yakymchuk, Technology of speaker
     recognition of multimodal interfaces automated systems under stress, in: The Experience of
     Designing and Application of CAD Systems in Microelectronics, CADSM, 2013, pp. 447-448.
[36] V. Lytvyn, V. Vysotska, Y. Burov, O. Veres, I. Rishnyak, The Contextual Search Method Based
     on Domain Thesaurus, volume 689 of Advances in Intelligent Systems and Computing, 2018, pp.
     310-319.
[37] J. Su, V. Vysotska, A. Sachenko, V. Lytvyn, Y. Burov, Information resources processing using
     linguistic analysis of textual content, in: Intelligent Data Acquisition and Advanced Computing
     Systems Technology and Applications, Romania, 2017, pp. 573-578.
[38] V. Vysotska, Linguistic Analysis of Textual Commercial Content for Information Resources
     Processing, in: Modern Problems of Radio Engineering, Telecommunications and Computer
     Science, TCSET, 2016, pp. 709-713.
[39] Lytvyn Vasyl, Vysotska Victoria, Dosyn Dmytro, Holoschuk Roman, Rybchak Zoriana,
     Application of Sentence Parsing for Determining Keywords in Ukrainian Texts, in: International
     Conference on Computer Sciences and Information Technologies, CSIT, 2017, pp. 326-331.
[40] V. Vysotska, V. Lytvyn, Y. Burov, P. Berezin, M. Emmerich, V. B. Fernandes, Development of
     Information System for Textual Content Categorizing Based on Ontology, volume Vol-2362 of
     CEUR Workshop Proceedings, 2019, pp. 53-70.
[41] V. Lytvyn, V. Vysotska, O. Veres, I. Rishnyak, H. Rishnyak, Classification methods of text
     documents using ontology based approach, volume 512 of Advances in Intelligent Systems and
     Computing, 2017, pp. 229-240.
[42] V. Lytvyn, V. Vysotska, I. Peleshchak, T. Basyuk, V. Kovalchuk, S. Kubinska, L. Chyrun, B.
     Rusyn, L. Pohreliuk, T. Salo, Identifying Textual Content Based on Thematic Analysis of Similar
     Texts in Big Data, in: Proceedings of the International Conference on Computer Sciences and
     Information Technologies, CSIT, 2019, pp. 84-91.
[43] O. Bisikalo, V. Vysotska, Linguistic analysis method of Ukrainian commercial textual content for
     data mining, volume Vol-2608 of CEUR Workshop Proceedings, 2020, pp. 224-244.
[44] V. Lytvyn, V. Vysotska, I. Budz, Y. Pelekh, N. Sokulska, R. Kovalchuk, L. Dzyubyk, O.
     Tereshchuk, M. Komar, Development of the quantitative method for automated text content
     authorship attribution based on the statistical analysis of N-grams distribution, volume 6(2-102) of
     Eastern-European Journal of Enterprise Technologies, 2019, pp. 28-51.
[45] V.-A. Oliinyk, V. Vysotska, Y. Burov, K. Mykich, V. Basto-Fernandes, Propaganda Detection in
     Text Data Based on NLP and Machine Learning, volume Vol-2631 of CEUR workshop
     proceedings, 2020, pp. 132-144.
[46] V. Vasyliuk, Y. Shyika, T. Shestakevych, Information System of Psycholinguistic Text Analysis,
     volume Vol-2604 of CEUR workshop proceedings, 2020, pp. 178-188.
[47] O. Artemenko, V. Pasichnyk, N. Kunanets, K. Shunevych, Using sentiment text analysis of user
     reviews in social media for e-tourism mobile recommender systems, volume Vol-2604 of CEUR
     workshop proceedings, 2020, pp. 259-271.
[48] I. Gruzdo, I. Kyrychenko, G. Tereshchenko, O. Cherednichenko, Applıcatıon of Paragraphs
     Vectors Model for Semantıc Text Analysıs, volume Vol-2604 of CEUR workshop proceedings,
     2020, pp. 283-293.
[49] O. Kuropiatnyk, V. Shynkarenko, Text Borrowings Detection System for Natural Language
     Structured Digital Documents, Vol-2604 of CEUR workshop proceedings, 2020, pp. 294-305.
[50] M. Sazhok, V. Robeiko, R. Seliukh, D. Fedoryn, O. Yukhymenko, Written Form Extraction of
     Spoken Numeric Sequences in Speech-to-Text Conversion for Ukrainian, volume Vol-2604 of
     CEUR workshop proceedings, 2020, pp. 442-451.
[51] V. Lytvyn, S. Kubinska, A. Berko, T. Shestakevych, L. Demkiv, Y. Shcherbyna, Peculiarities of
     Generation of Semantics of Natural Language Speech by Helping Unlimited and Context-
     Dependent Grammar, volume Vol-2604 of CEUR workshop proceedings, 2020, pp. 536-551.
[52] R. Bekesh, L. Chyrun, P. Kravets, A. Demchuk, Y. Matseliukh, T. Batiuk, I. Peleshchak, R. Bigun,
     I. Maiba, Structural Modeling of Technical Text Analysis and Synthesis Processes, volume Vol-
     2604 of CEUR workshop proceedings, 2020, pp. 562-589.
[53] А. Taran, Information-retrieval System "Base of the World Slavic Linguistics (iSybislaw)" in
     Language Education, volume Vol-2604 of CEUR workshop proceedings, 2020, pp. 590-599.
[54] L. Chyrun, Model of Adaptive Language Synthesis Based on Cosine Conversion Furies with the
     Use of Continuous Fractions, Vol-2604 of CEUR workshop proceedings, 2020, pp. 600-611.
[55] T. Kovaliuk, N. Kobets, G. Shekhet, T. Tielysheva, Analysis of Streaming Video Content and
     Generation Relevant Contextual Advertising, volume Vol-2604 of CEUR workshop proceedings,
     2020, pp. 829-843.
[56] I. Khomytska, V. Teslyuk, A. Holovatyy, O. Morushko, Development of methods, models, and
     means for the author attribution of a text, volume 3(2-93) of Eastern-European Journal of
     Enterprise Technologies, 2018, pp. 41-46.
[57] I. Khomytska, V. Teslyuk, Authorship and Style Attribution by Statistical Methods of Style
     Differentiation on the Phonological Level, volume 871 of Advances in Intelligent Systems and
     Computing III, AISC, Springer, 2019, pp. 105-118.
[58] V. Vysotska, V. Lytvyn, V. Kovalchuk, S. Kubinska, M. Dilai, B. Rusyn, L. Pohreliuk, L. Chyrun, S.
     Chyrun, O. Brodyak, Method of Similar Textual Content Selection Based on Thematic Information
     Retrieval, in: Proceedings of the International Conference on Computer Sciences and Information
     Technologies, CSIT, 2019, pp. 1-6.
[59] V. Vysotska, Ukrainian Participles Formation by the Generative Grammars Use, volume Vol-2604 of CEUR
     workshop proceedings, 2020, pp. 407-427.
[60] O. Bisikalo, V. Vysotska, Y. Burov, P. Kravets, Conceptual Model of Process Formation for the Semantics
     of Sentence in Natural Language, volume Vol-2604 of CEUR workshop proceedings, 2020, pp. 151-177.
[61] O. Cherednichenko, N. Babkova, O. Kanishcheva, Complex Term Identification for Ukrainian
     Medical Texts, volume Vol-2255 of CEUR Workshop Proceedings, 2018, pp. 146-154.
[62] N. Sharonova, A. Doroshenko, O. Cherednichenko, Issues of fact-based information analysis,
     volume Vol-2136 of CEUR Workshop Proceedings, 2018, pp. 11-19.
[63] U. Shandruk, Quantitative Characteristics of Key Words in Texts of Scientific Genre (on the
     Material of the Ukrainian Scientific Journal), volume Vol-2362 of CEUR Workshop Proceedings,
     2019, pp. 163-172.
[64] Y. Bobalo, P. Stakhiv, B. Mandziy, N. Shakhovska, R. Holoschuk, The concept of electronic
     textbook "Fundamentals of theory of electronic circuits", volume 88(3 A) of Przeglad
     Elektrotechniczny, 2012, pp. 16-18.
[65] NB. Shakhovska, R.Yu. Noha, Methods and tools for text analysis of publications to study the
     functioning of scientific schools, volume 47(12) of Journal of Automation and Information
     Sciences, 2015, pp. 29-43.
[66] J. Pach, P. Bilski, A Robust Binarization and Text Line Detection in Historical Handwritten
     Documents Analysis, volume 15(3) of International Journal of Computing, 2016, pp. 154-161.
[67] T. Batura, A. Bakiyeva, M. Charintseva, A method for automatic text summarisation based on
     rhetorical analysis and topic modeling, volume 19(1) of International Journal of Computing, 2020,
     pp. 118-127.
[68] Google Codelabs, 2020. URL: codelabs.developers.google.com.
[69] Android, 2020. URL: www.andrious.com.
[70] Insights on Kotlin, 2020. URL: 101droid.wordpress.com.
[71] R. Seethalakshmi, T. R. Sreeranjani, T. Balachandar, A. Singh, M. Singh, R. Ratan, S. Kumar,
     Optical character recognition for printed Tamil text using Unicode, volume 6(11) of Journal of
     Zhejiang University-SCIENCE A, 2005, pp. 1297-1305.
[72] Get AI Code Completions for your IDE, 2020. URL: www.codota.com.