<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Effectiveness of Modern Text Recognition Solutions and Tools for Common Data Sources</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kirill Smelyakov</string-name>
          <email>kyrylo.smelyakov@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiya Chupryna</string-name>
          <email>anastasiya.chupryna@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmytro Darahan</string-name>
          <email>dmytro.darahan@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serhii Midina</string-name>
          <email>serhii.midina@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>14 Nauky Ave., Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the article features of functioning of the most common optical character recognition (OCR) tools EasyOCR and TesserOCR are considered; experimental analysis of results of work of these OCR is given for the most widespread data sources, such as electronic text document, internet resource, and banner; based on analysis of the experiment results from the comparative analysis of considered OCRs by time and accuracy was made; effective algorithm of using an OCR and recommendations for their application was offered for not-distorted data, for slightly and highly distorted data.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Optical character recognition (OCR)</kwd>
        <kwd>text recognition</kwd>
        <kwd>efficiency estimation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Optical character recognition (OCR) is widely used for extracting text information from printed
documents, books, posters, business cards, electronic text documents, and internet resources, and also for
automation of data entry [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ].
      </p>
      <p>
        The output of recognition is a computerized text that can be easily handled [
        <xref ref-type="bibr" rid="ref1 ref5 ref6">1, 5, 6, 7</xref>
        ]. Received text
may be used in cognitive computing, text mining, and text-to-speech [8]. Such text in our time often may
be used in a large number of modern and perspective IT solutions and tools [9], first of all – in artificial
intelligence (AI) systems [10-12] and machine learning, in ICT, robotics-based on machine vision [13, 14,
15] and computational intelligence [16-19], etc.
      </p>
      <p>
        There are several approaches to recognize text data from images. Core algorithms of OCR systems
belong to one of two basic types: matrix matching and feature extraction with different efficiency in
different situations [
        <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
        ].
      </p>
      <p>Matrix matching (pattern matching, pattern recognition, image correlation) means comparing an image
with stored glyph pixel-by-pixel. Feature extraction decomposes glyphs into features, that comparing with
a vector-based representation of the character. More often neural networks (NN) are used for the detection
of features [10, 11]. This approach more accurate than matrix matching, so most modern OCR systems
using it. Regarding using NNs in the second approach, even one algorithm could show different results
depending on hyperparameters, image quality, etc. So, sometimes it's a quite serious challenge to choose
the appropriate OCR for a specific data source.</p>
      <p>The purpose of this article – to do a comparative analysis of efficiency of the most widespread OCR
(EasyOCR and TesserOCR) based on results of text recognition experiments in different conditions of
data acquisition; to describe features of using considered OCRs for common data sources; to offer an
algorithm and recommendations for effective practical application of EasyOCR and TesserOCR for
standard conditions of use. Therefore, the experiments are focused on assessing the effectiveness of the
state of the art OCR application on standard computers without special support of GPU and HPC
computing.</p>
      <p>The issues of image preprocessing are not considered, since this is a rather broad topic that requires
additional research. The images are considered to have been prepared and no additional preprocessing
steps are taken, besides than those OCR systems perform by default.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Experiment planning</title>
      <p>degree of distortion
were taken</p>
      <p>All artifacts were applied to photos using corresponding command-line keys for TRDG on the dataset
generation stage. A list of keys can be found in the user manual of the TRDG utility [22].</p>
      <p>All data, used in our work, and also results of recognition and summary table of results are in the data
warehouse [23]. Next UML-diagram (Figure 1) visualizes the file structure of the using dataset.</p>
      <p>Both services TesserOCR and EasyOCR were tested on the same input data and hardware under one
test script, measuring text recognition time for both services.</p>
      <p>To ensure a level playing field for both neural networks, based on which the tested services work, we
are not accelerating them explicatively by using such common practices as batching or using
multithreading; was left default settings, given by developers. Both NNs have flexible API and wide
possibilities for customization, therefore, with correctly selected hyperparameters during neural network
training for a specific situation, any of them may show much better results, especially in case of using
GPU and HPC, but then it will be difficult to objectively compare their productivity.</p>
      <p>Since networks accept files in different ways, this difference was also taken into consideration.
EasyOCR can accept directly most image file formats, TesserOCR by default accept only a few file
formats directly when transmitting their addresses on a drive, which makes it less flexible than EasyOCR.</p>
      <p>TesserOCR relies in work on using PIL (Python Imaging Library) for transmitting images. Actually,
PIL just uploads images from drive into intermediate format. For these purposes, it can also be used
OpenCV, scikit, etc. The work of two libraries in conjunction is described in the user's manual and
examples from developers, which can be found on the official project page. Preliminary experiments
during the phase of search of researched libraries show, that image transmitting by using PIL does not
significantly affect text recognition speed and quality.</p>
      <p>Based on this, and also on the fact that this approach is actually standard for TesserOCR, it was used
in NNs testing, to put them on a level playing field. For the purity of the experiment, for TesserOCR was
also counted time for using PIL functions, the cause of EasyOCR should also perform image
transformation in a recognition method call.</p>
      <p>During experiment performance, test script one by one launches both NNs for recognition of every
image from the dataset, marks the start and end time of text recognition, and saves time value and
recognized data for each example in JSON format for further processing. For experiment purity, tests
were run 3 times to reduce the possible impact of low-level processes of OS or the behavior of the
hardware part of the PC. In total it is 122 recognitions per one iteration (61 per each NN), and a total of
366 recognitions was made during the experiment.</p>
      <p>The experiment has been run on a laptop HP Elitebook 8460p with Mobile DualCore Intel Core
i52540M CPU, frequency 3100 MHz (31 x 100) with 4 threads ant 4 Gb RAM DDR3 and Intel Sandy
Bridge-MB – Integrated Graphics Controller (MB GT2 1.3GHz+) under the manage of OS Kali Linux.
This computer was taken as an instance of the target device class of the experiment.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment, results and discussion</title>
      <p>Estimates of the results of experiments are presented in Table 1 – Table 12.</p>
      <p>Table 1 shows recognition time for the scanned fragment of a book page in seconds, where full /
half/quarter in the file name – page filling degree, high / medium / worst – image quality level.</p>
      <p>Table 12 shows the probability of correct character recognition on clear photos during a series of
experiments.</p>
      <p>We have found that TesserOCR much faster than EasyOCR in all test cases. Also, TesserOCR shows
quite better accuracy in most cases, but for the recognition of small hard distorted images, EasyOCR
shows better accuracy. So, for images of average big plain text, such as book pages the best OCR is
TesserOCR, but for short texts with a lot of noise speeded up EasyOCR may be preferred.</p>
      <p>Was found a direct relationship between the accuracy of recognition within one format and a time of
recognition, and also direct dependence on the quality of recognizable image, regardless of file format.
With lower quality, recognition is faster, at the same time, the inverse dependency of the recognition
quality on the image quality is weakly expressed and does not always manifest itself. With different
image formats, the BMP format often falls out of the general pattern, and corresponds to the average
quality of the JPG format, slightly inferior to it in time, but superior in quality. In fact, the file format
does not directly affect the recognition quality, since images are usually converted to an intermediate
format, but the quality of the image stored on the disk directly affects the recognition quality.</p>
      <p>During the control experiments, in the same conditions, and on the same inputs, was detected spread of
time values between tests, most likely due not to functioning features of computer hardware and software,
but due to features of algorithms of ОCR. Regardless of differences of received results, results of al
a grouped around mean, the spread relative to which is the greater, the higher the quality, and the larger
the volume of the recognized text. For each test on EasyOCR, the spread is much larger than for
TesserOCR.</p>
      <p>TesserOCR is much faster than EasyOCR and better recognize regular book text, but if natural scene
images are taken, then EasyOCR works better. Of course, EasyOCR is still inferior to the opponent in
time, but not in quality, recognizing text fully or partially, where TesserOCR doesn’t recognize anythin
In Table 13 given superiority rates for TesserOCR and EasyOCR by recognition time and quality and
symbols per second ratio. Constructing this table, we calculated average indicators of time and accuracy
by given according to the tables above and divided corresponding indicators for EasyOCR into indicators
for TeserOCR. Their quality (accuracy) is expressed by the ratio of correct recognition probability. Our
calculations may be expressed by formula
where R – superiority rate for corresponding indicators; n – numbers of tests; e – indicators, obtained in
experiments for EasyOCR; t – indicators, obtained in experiments for TesserOCR. Figures 2 and 3
provide a visualized presentation of obtained superiority rates. Figure 4 visualizes the superiority rate for
TesserOCR and EasyOCR by symbols per second ratio.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Algorithm and recommendations</title>
      <p>Analyzing obtained experimental results it is possible to formulate algorithms and recommendations
for practical use of reviewed text recognition tools. The common algorithm of text recognition consists of
2 stages: 1) preprocessing and 2) recognition of prepared text. In the preprocessing stage, it is advisable to
convert images into bitmaps before submitting them to the neural network, e.g. using PIL (Python
Imaging Library) or any other appropriate image processing library (OpenCV, scikit, etc), if the image
was originally not in format BMP. It will give the most balanced ratio of time to quality among other
formats and also will provide more effective work with images. Converting into bitmap gives unification
of image processing code in the final application, besides, in some situations, bitmap capture from a
screen is the fastest way of image feed. Nevertheless, if the priority parameter of recognition is speed,
then it is advisable to convert images into lighter formats with lower quality. In this case, the quality of
recognition without preliminary retraining or hyperparameter tuning will dropdown. For effective use of
both frameworks, it is necessary to do hyperparameter tuning, to train a neural network for specific input
data types, and also to convert images into bitmaps before recognition. In this case, performance will be
much better, than the one described as a result of the experiment. According to experiment results, the
TesserOCR library can be recommended for recognition of scanned books, screenshots, soft blurred and
clear photos. While the EasyOCR library can be recommended for recognition of advertisements,
banners, and hard distorted photos (i.e. natural scene images). Based on the results of the analysis it
should be noted that nowadays the EasyOCR library is actively developed and in the coming years by
certain parameters can surpass TesserOCR, especially for the conditions of neural network optimization
and overclocking.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In the work was set up the experiment and made a comparative analysis of the performance of the
most common OCRs (TesserOCR and EasyOCR) by time and by the accuracy of recognition in equals
conditions (with default configuration). Described application features of both services for different data
sources, which can appear during the real use of neural networks. Received results describe only neural
network performance without training stage and give only a general understanding of the functioning of
each of them and about their field of application at the time of writing of the article. A complete analysis
of the capabilities of each NN is needed deeper researches, which, however, was not the aim of this work.
They imply serious modification and optimization of the code, obtained just after installation,
programming on the lower level, retraining of NN, provided by developers or image preprocessing.
According to the analysis of the experimental results, proposed algorithms and recommendations for
effective practical application of reviewed OCRs. Besides, it is also recommended to do configuration and
training of NN for each specific case (data source) for obtaining the best performance.</p>
      <p>It is also important to remember that the results of the experiments carried out are relevant only to
standard computer without the support of GPU and HPC calculations. This is done specifically to
evaluate the performance of real applications that will run on the same standard computers.</p>
    </sec>
    <sec id="sec-6">
      <title>6. References</title>
      <p>[7] An Introduction to OCR for Beginners, 2020. URL:
https://towardsdatascience.com/an-introductionto-optical-character-recognition-for-beginners-14268c99d60.
[8] What is Cognitive Computing? How are Enterprises benefitting from Cognitive Technology, 2019.</p>
      <p>URL:
https://towardsdatascience.com/what-is-cognitive-computing-how-are-enterprises-benefittingfrom-cognitive-technology-6441d0c9067b.
[9] Kyrychenko I., Gruzdo I., Tereshchenko G., Cherednichenko O. Applıcatıon of Paragraphs Vectors
Model for Semantıc Text Analysıs, in: Proceedings of the 4th International Conference on
Computational Linguistics and In-telligent Systems (COLINS-2020), Lviv, Ukraine, April 23-24,
2020. – Volume I, pp. 283-293.
[10] Ian Goodfellow, Yoshua Bengio, Aaron Courville Deep Learning, 3rd. ed., MIT Press, 2016, 787p.</p>
      <p>DOI/ISBN: 0262035618.
[11] Peter Norvig , Stuart Russell Artificial Intelligence: A Modern Approach, 3rd. ed., Pearson</p>
      <p>Education Limited, 2016, 1152p. DOI/ISBN: 9780136042594.
[12] Kyrychenko I., Proniuk G., Geseleva N., Tereshchenko G. Spatial interpretation of the notion of
relation and its application in the sys-tem of artificial intelligence, in: Proceedings of the 3rd
International Conference on Computational Linguistics and In-telligent Systems (COLINS-2019),
Kharkiv, Ukraine, April 18-19, 2019. – pp. 266-276.
[13] Rafael C. Gonzalez, Richard E. Woods Digital Image Processing, 4th. ed., Pearson/Prentice Hall,
2018, 1168p. DOI/ISBN: 9780133356724.
[14] Chupryna A., Smelyakov K., Hvozdiev M., Sandrkin D. Gradational Correction Models Efficiency
Analysis of Low-Light Digital Image, in: Proceedings of the 2019 Open Conference of Electrical,
Electronic and Information Sciences (eStream), 25 April 2019, Vilnius, Lithuania. – P. 1-6. doi:
10.1109/eStream.2019.8732174.
[15] Datsenko A., Skrypka V., Akhundov A., Smelyakov K. Efficiency of Image Reduction Algorithms
with Small-Sized and Linear Details, in: Proceedings of the 2019 IEEE International
ScientificPractical Conference Problems of Infocommunications, Science and Technology (PIC S&amp;T), 8-11
Oct. 2019, Kyiv, Ukraine. – P. 745-750. doi: 10.1109/PICST47496.2019.9061250.
[16] Sharonova, N., Kanishcheva, O., Image and video tag aggregation // CEUR Workshop Proceedings,</p>
      <p>Vol. 2268, 2017, pp. 161-172, 2017.
[17] Kanishcheva, O.. Cherednichenko, O., Sharonova, N. Image Tag Core Generation // Proceedings of
the 1st International Workshop on n Digital Content &amp;amp; Smart Multimedia (DCSMart 2019).</p>
      <p>Lviv, Ukraine, December 23-25, 2019, pp. 35-44.
[18] Cherednichenko, O., Yanholenko, O., Vovk, M., Sharonova, N. Towards structuring of electronic
marketplaces contents: Items normalization technology // CEUR Workshop Proceedings. Vol. 2604,
2020, P. 44-55.
[19] Smelyakov K., Yeremenko D., Sakhon A., Polezhai V., Chupryna A. Braille character recognition
based on neural networks, in: Proceedings of the IEEE Second International Conference on Data
Stream Mining &amp; Processing (DSMP), August 21-25. – 2018. – P. 509-513.
[20] Tesserocr, 2020. URL: https://github.com/sirfz/tesserocr.
[21] EasyOCR, 2021. URL: https://www.jaided.ai/easyocr.
[22] TextRecognitionDataGenerator, 2020. URL:
https://github.com/Belval/TextRecognitionDataGenerator.
[23] Data storage with our data and results, 2021. URL:
https://drive.google.com/drive/folders/1dy2gEDBf38t_TY1933g57JyDJ2J4MmWA?usp=sharing.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>What</surname>
            <given-names>is OCR</given-names>
          </string-name>
          and OCR Technology,
          <year>2020</year>
          . URL: https://pdf.abbyy.com/learning-center/what-is-ocr.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Berchmans</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , ”
          <article-title>Optical character recognition: eAwn anodvervain insight”</article-title>
          ,
          <source>in: Proceedings of the 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)</source>
          ,
          <year>Kanyakumari</year>
          ,
          <year>2014</year>
          , pp.
          <fpage>1361</fpage>
          -
          <lpage>1365</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCICCT.
          <year>2014</year>
          .
          <volume>6993174</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Sabu</surname>
          </string-name>
          and
          <string-name>
            <surname>A. S.</surname>
          </string-name>
          <article-title>DSasu,rve”yA on various Optical Character Recognition Techniques”</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference on Emerging Devices and Smart Systems (ICEDSS)</source>
          ,
          <year>Tiruchengode</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>152</fpage>
          -
          <lpage>155</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICEDSS.
          <year>2018</year>
          .
          <volume>8544323</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rahman Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Uddin</given-names>
            <surname>Mahmud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jahan</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Alam</surname>
          </string-name>
          , ”
          <article-title>Offline optical character recognition (OCR) method: An effective method for scanned documents”</article-title>
          ,
          <source>in: Proceedings of the 2019 22nd International Conference on Computer and Information Technology (ICCIT)</source>
          , Dhaka, Bangladesh,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCIT48885.
          <year>2019</year>
          .
          <volume>9038593</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>What is</surname>
            <given-names>OCR</given-names>
          </string-name>
          ?
          <article-title>An Introduction to Optical Character Recognition</article-title>
          ,
          <year>2020</year>
          . URL: https://anyline.com/news/what-is-ocr.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brisinello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Grbić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stefanovič</surname>
          </string-name>
          and R-.
          <source>KoPveačk</source>
          ,
          <article-title>ai”Optical Character Recognition on images with colorful background”</article-title>
          ,
          <source>in: Proceedings of the 2018 IEEE 8th International Conference on Consumer Electronics</source>
          , Berlin,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCE-Berlin.
          <year>2018</year>
          .
          <volume>8576202</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>