<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Learning-Based Framework For Text Detection and Recognition In Natural Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Djouher Akrour</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Akram Khelili</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Imene Aloui</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Azeddine Aissaoui</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre de Recherche Scientifiques et Techniques sur les Régions Arides</institution>
          ,
          <addr-line>Campus Universitaire</addr-line>
          ,
          <institution>Université Mohamed Khider</institution>
          ,
          <addr-line>Biskra</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LESIA laboratory / Department of Computer Science, Biskra University</institution>
          ,
          <addr-line>PB 145 RP, 07000 Biskra</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LINFI laboratory / Department of Computer Science, Biskra University</institution>
          ,
          <addr-line>City communal 197 Biskra</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
      </contrib-group>
      <fpage>79</fpage>
      <lpage>88</lpage>
      <abstract>
        <p>Detecting and recognizing text in natural images is a critical task for extracting meaningful information, yet it remains highly challenging due to the variability and complexity of unstructured text in real-world scenarios. Traditional image processing techniques often rely on handcrafted features, which struggle to adapt to the diverse and unpredictable nature of text in the wild. To address these limitations, this paper leverages advancements in deep learning to develop a robust framework capable of adaptive feature learning, text extraction, and digitization. The proposed method utilizes YOLOv5 for precise localization of text-rich regions, followed by an LSTM-based module to segment text into individual characters. These characters are subsequently processed by a Capsule Network-based recognition module, ensuring accurate text recognition. A semantic post-processing step is incorporated to further enhance the system's overall performance. Experimental evaluations conducted on popular benchmark datasets demonstrate that the proposed framework significantly outperforms existing state-of-the-art methods, achieving superior accuracy and eficiency in both text detection and recognition tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Capsule Network</kwd>
        <kwd>Yolov5</kwd>
        <kwd>LSTM</kwd>
        <kwd>Text detection</kwd>
        <kwd>Text recognition</kwd>
        <kwd>Semantic recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        performance, making it suitable for practical deployment such as Faster R-CNN [29], rely on regional proposals
in robotics and mobile applications. In the second stage, a and have inspired advanced models like Connectionist
Latin text recognition module is introduced, which com- Text Proposal Network (CTPN) [30], R2CNN [31], and
bines character segmentation via an LSTM network and RRPN [
        <xref ref-type="bibr" rid="ref21">32, 33</xref>
        ]. For example, TextFuseNet [
        <xref ref-type="bibr" rid="ref22 ref23">34, 35</xref>
        ] uses
text recognition using a Capsule Network (CapsNet) to multi-level feature representations and multi-path fusion
capture complex spatial relationships between charac- to enhance text detection, achieving high accuracy but
ters and words. The system is further enhanced by a with significant computational overhead. On the other
semantic post-processing step that applies grammatical hand, one-stage approaches eliminate the region
procorrections and evaluates word similarity using metrics posal phase and directly estimate candidate text regions
such as Levenshtein distance and cosine similarity. from feature maps. Networks such as YOLO [
        <xref ref-type="bibr" rid="ref24 ref25 ref26">36, 37, 38</xref>
        ],
      </p>
      <p>
        The primary contributions of this work are as follows: SSD [
        <xref ref-type="bibr" rid="ref27">39</xref>
        ], and their derivatives have demonstrated
exFirst, we present a robust end-to-end system for scene ceptional eficiency. For instance, Gupta et al. [
        <xref ref-type="bibr" rid="ref28">40</xref>
        ]
intetext detection and recognition tailored for Latin scripts. grated YOLO with a random-forest classifier to reduce
Second, we propose an eficient one-stage text detector false positives, while He et al. [
        <xref ref-type="bibr" rid="ref29">41</xref>
        ] incorporated an
atbased on a Fully Convolutional Network (FCN), which tention mechanism in SSD to suppress background noise.
handles multi-scale text detection without introducing Similarly, TextBoxes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and its extension, TextBoxes++
excessive computational overhead. Third, we introduce [
        <xref ref-type="bibr" rid="ref30">42</xref>
        ], addressed varying text aspect ratios and arbitrary
an innovative recognition module that integrates LSTM orientations, respectively, while SegLink [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] used SSD
and CapsNet, achieving comparable performance to state- to segment text into smaller components linked into
comof-the-art systems in text recognition tasks. plete instances. EAST [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] directly employed a fully
      </p>
      <p>
        The remainder of this paper is organized as follows: convolutional network (FCN) for eficient text region
Section 2 provides an overview of related work, high- detection without unnecessary intermediate steps,
follighting significant advances in scene text detection, EEG lowed by thresholding and non-maximum suppression
classification, and robotic applications. Section 3 presents for refinement.
the details of the proposed framework. Section 4 outlines Text recognition methods are generally classified into
the experimental setups and performance evaluations, sequence-based, word-based, and character-based
apwhile Section 5 concludes with a summary and future proaches. Sequence-based approaches represent text as
directions. a sequence of characters. For example, CRNN [
        <xref ref-type="bibr" rid="ref31">43</xref>
        ]
combines convolutional and recurrent neural networks to
extract feature sequences and model contextual
infor2. Related work mation. Similarly, Shi et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] integrated a spatial
transformer network with a sequence recognition
network to robustly recognize irregular text. Word-based
approaches, such as Jaderberg et al.’s method [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], focus
on recognizing entire words by training convolutional
neural networks on synthetic word datasets. While these
methods have achieved state-of-the-art performance,
they are often constrained by a predefined vocabulary.
      </p>
      <p>
        Character-based approaches, on the other hand, detect
and recognize individual characters before assembling
them into words. For instance, Minetto et al. [
        <xref ref-type="bibr" rid="ref32">44</xref>
        ]
utilized histograms of oriented gradients for character
description and recognition, while Yao et al. [45] proposed
Strokelets, a robust multi-scale representation capturing
character structures at diferent levels. This approach
ofers greater flexibility and is not limited by text length,
making it suitable for complex scenarios.
      </p>
      <p>These advancements in both text detection and
recognition have significantly contributed to the development
of more robust and eficient systems, laying a strong
foundation for further research in this domain.</p>
      <sec id="sec-1-1">
        <title>The detection and recognition of scene text have garnered</title>
        <p>
          substantial attention in the computer vision domain due
to their significance in numerous real-world applications.
Over the years, various methods have been proposed to
tackle the challenges associated with scene text detection
and recognition, which have been thoroughly reviewed
in several comprehensive surveys and analyses [
          <xref ref-type="bibr" rid="ref20">20, 21</xref>
          ].
These methods can be broadly classified into two main
categories: text detection and text recognition.
        </p>
        <p>
          Scene text detection approaches can be divided into
traditional machine learning-based methods and modern
deep learning-based methods. Traditional approaches
rely heavily on handcrafted features and techniques such
as sliding windows and connected components to detect
text in natural scene images [22, 23, 24, 25, 26]. Although
these methods have shown promising results, they
often sufer from a high rate of false positives when
applied to complex and diverse real-world scenarios. In
contrast, deep learning-based methods have emerged
as the dominant approach, ofering improved accuracy
and robustness [
          <xref ref-type="bibr" rid="ref11 ref14">11, 27, 14, 28</xref>
          ]. Deep learning-based text
detection methods can be further categorized into
twostage and one-stage strategies. Two-stage approaches,
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Proposed model</title>
      <sec id="sec-2-1">
        <title>The proposed model, as illustrated in Figure 1, consists</title>
        <p>of two imperative component including text detector and
text recognizer. Firstly, candidate text region is localized
from input image using one-stage text detector based on
YOLOv5. Following that, text image is segmented into
set of individual character patches using BILSTM-based
segmentation technique. Then, these patches pass
oneby-one to the capsule network which help to accurately
recognize each character. The Set of recognized
characters form complete word which pass by Post-Processing
module to apply semantic correction in order to enhance
the accuracy and efectiveness of recognizer component.
More details about each component are described below.</p>
        <sec id="sec-2-1-1">
          <title>3.1. One-stage text detector</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Yolov5 was chosen as our scene text detector for several</title>
        <p>key reasons. First, it integrates the Cross Stage Partial
Network (CSPNet) [46] with Darknet, forming
CSPDarknet as its backbone. This design enhances inference speed
and accuracy while reducing computational complexity
by merging feature maps from diferent network stages.
Second, Yolov5 employs the Path Aggregation Network
(PANet) [47] to improve information flow. PANet uses
an enhanced Feature Pyramid Network (FPN) structure
with a shorter bottom-up path to better propagate
lowlevel features, aiding the model’s performance on unseen
data and improving text scaling. Additionally, adaptive
feature pooling ensures valuable information is passed
through each feature level, enhancing localization
accuracy for text detection. Finally, Yolov5’s detection heads
generate three diferent feature map sizes, enabling
multiscale predictions and enabling the detector to handle text
of varying sizes under challenging real-world conditions.</p>
        <sec id="sec-2-2-1">
          <title>3.2. Text Recognition System</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>In this section, we introduce the second stage of our framework which consists of three modules:</title>
        <p>3.2.1. Segmentation module
After detecting the text using Yolov5, two layers of LSTM
have been used with 256 units to learn long-ranges
temporal dependencies. The LSTM architecture consists of
three gates called input, forget, and output gates,
connected with memory cells which make the LSTM stores
the previous context for long time. The input gate
consists of encoding information by applying hyperbolic
tangent function (ℎ) on the active cell () and the
previous cell output (ℎ− 1) in order to generate vector
of values between − 1 and +1. Meanwhile, the forget
gate used () and (ℎ− 1) to be multiplied with weight’s
matrices and added to the bias, then passed to the
activation function which resulted binary values. Where the 0
means that the cell information will be cleaned, however,
the 1 means that the cell information will be stored for
the future use. The output gate applies  and
ℎ function to active cell () and the previous cell
output (ℎ− 1), then, multiply them with the vector
values generated in the input gate to produce an output that
will be passed to the next cell.</p>
        <p>In our work, we used bidirectional LSTM, as shown in
ifgure 2, to context information from each vector of the
 =  max (︀ 0, + − ‖ ‖)︀ 2
+  (1 − ) max (︀ 0, ‖‖ − − )︀ 2
(1)
where  = 1 if an image of class  is present and
+ = 0.9 and − = 0.1, we use  = 0.5.</p>
        <p>The 8 × 16 transformation matrix  maps the
8dimensional capsule input space to a 16-dimensional
capsule output space for each class  in relation to the capsule
output of the previous layer . The predicted vector ˆ|
is expressed by a matrix operation between the weight
matrix  and .</p>
      </sec>
      <sec id="sec-2-4">
        <title>The final output  for class  is computed using novel</title>
        <p>vector-to-vector nonlinearity squashing function as:
detected words by applying the forward and the
backward LSTM. The first one is used to analyze a vector
of forward hidden state→s−  = →{−1→,−2, ..., } , which
→−
is only dependent on the left neighbors at each time .
while, the backward LSTM is used for analyzing a vector
of backward hidden state− s ←  =− {← 1−, ←2, ..−., ←}, which
is only dependent on the right neighbors at each time
. In the last step, the result of forward and backward
should be concatenate to represent character’s segment
at each vector  =→[−−; ←]. The output of the
segmentation is sequence of character’s image which will be faded
to CapsNet after convert it to binary image.
3.2.2. Recognition module</p>
      </sec>
      <sec id="sec-2-5">
        <title>Here, we tend to apply the same CapsNet structure employed previously in [6] and modifying it according to our purpose. Figure 3 depicts the overall CapsNet structure used for scene text recognition.</title>
        <p>The CapsNet structure is composed of an encoder and
a decoder, former of which comprises of:
•  Convolutional Layers: The layer has 256
kernels each with a bias term, stride of 1, size of In this module, English lexicon, Levenshtein distance
9× 9× 1 followed by the rectified linear activation [48] and cosine similarity [49] metrics are adopted to
( ). This layer used as lower-level feature grammatically check the resulted word from CapsNet.
extractors and outputs 20 × 20 × 256 tensor. The main purpose of use such metrics is to determine the
• PrimaryCaps layers: The 8 capsule layer applies required number of changes (inserting, deleting or
replac9 × 9 × 256 convolutional kernels, with stride ing a character in word) and enhancing recognizer
com2, to the 20 × 20 × 256 input tensor. This layer putational eficiency by reducing the number of words
produce combination of the above feature outputs that will be treated by cosine metric. Figure 4 depicts the
and generates 6 × 6 × 8 × 8 tensor. overall architecture of post-processing module.
• CharCaps Layers: These 70 capsule layers are The word generated by CapsNet pass firstly to the
used for the generation of the loss function and lexicon for selecting the set of words that have the same
transformational weight matrix. Stem of the input word. Then, this set of words will be
handled one-by-one by the two metrics mentioned before.</p>
        <p>Whereas, Decoder consists of three Fully Connected Finally, the word with the highest cosine similarity is
layers (FC). chosen as the correct word.</p>
        <p>The loss function is calculated for correct and incorrect Levenshtein [48] is based on calculating the distance
CharCaps, primarily defined as 1 if the correct label corre- matrix between the components of two words. The first
sponds with the character of this particular CharCap and step is to create matrix of shape ( + 2,  + 2) where 
0 otherwise. A zero-loss event is initiated either when a and  are the size of the two words. The first two lines
probability of right or wrong prediction is greater than represent the first word and indices respectively, and the
+ or less than − , respectively. For each CharCaps first two columns represent the second word and indices
capsule, , the incurred loss is as follows: respectively. Then, the matrix should be completed with
where:
with  coupling coeficients measuring the probability
of primary capsule  probabilistically triggering capsule
.  representing the weighted sum shrinked by the
squashing function.
3.2.3. Post-processing module
0. For instance, the matrix of the word “beter” and “better”
will look like:</p>
        <p>After that, we have to compare between the characters
of the two words, character by character in each row and
each column. The value of comparison in the point (, )
will be the minimum of three values [( − 1, ) + 1],
[( − 1,  − 1)], and [(,  − 1) + 1]. The output of this
matrix will be:
⎛
⎜
⎜⎜
⎜⎜
⎜⎜
⎜⎜
⎜⎝</p>
        <p>⎛
⎜
⎜⎜
⎜⎜
⎜⎜
⎜⎜
⎜⎝</p>
      </sec>
      <sec id="sec-2-6">
        <title>As we see in the resulting matrix, the positions (5, 4)</title>
        <p>and (6, 5) have the value 1 which are incorrect because
the letter “e” in the position (5, 0) is equal to the letter “e”
in the position (0, 4). In addition to that, the Levenshtein
distance between the two words is 1 which means that
there is missing character in the second word. Using
Levenshtein distance allows recognizer to select three
most identical words from the set of words who will be
next treated by the cosine metric.</p>
        <p>Cosine Similarity is based on calculating the cosine
angle of words’ vectors [49]. After constructing the
vector of the two words (1, 2), the cosine similarity is
calculated as follows:
cos (1, 2) =</p>
        <p>1 × 2
‖1‖ × ‖ 2‖
79–88
(5)
= √︀∑︀=1 12 ×
∑︀
=1 1 × 2
√︀∑︀</p>
        <p>=1 22
cos(, ) = 0.89</p>
      </sec>
      <sec id="sec-2-7">
        <title>The values of the cosine similarity will be arranged between 0 and 1 where values closer to 1 indicate that the words more similar.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments and results</title>
      <sec id="sec-3-1">
        <title>4.1. Datasets</title>
        <sec id="sec-3-1-1">
          <title>To evaluate the performance and versatility of our pro</title>
          <p>
            posed text detection and recognition framework, we
conduct experiments using four challenging benchmark
datasets: ICDAR2013 [50], ICDAR2015 [51],
MSRATD500 [52], and ICDAR2017-MLT [53]. The ICDAR2013
dataset is widely recognized as the standard benchmark
for horizontal text detection. It includes 229 training
images and 233 testing images, with word-level
annotations provided for each image. Similarly, the ICDAR2015
dataset comprises 1000 training images and 500
testing images, featuring various accidental scene text
instances annotated with quadrangular bounding boxes.
The MSRA-TD500 dataset contains 300 training images
and 200 test images, incorporating both English and
Chinese text. The text areas in this dataset are arbitrarily
oriented, and annotations are provided at the sentence
level, making it particularly challenging for text detection
models. The ICDAR2017-MLT dataset is a more complex
and diverse collection, consisting of 7200 training images,
1800 validation images, and 9000 testing images. This
dataset includes multi-oriented, multi-script, and
multilingual scene text instances with line-level and word-level
annotations, significantly increasing the dificulty of the
detection task. For the evaluation of text recognition, we
use a modified version of the EnglishFnt dataset from the
Chars74K collection [54], which has also been used in
previous works [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. This dataset is employed for
training the Long Short-Term Memory (LSTM) network for
word segmentation. To assess the efectiveness of our
text detection and recognition system, we adopt the
standard evaluation metrics, including precision (P), recall (R),
and F-measure (F), to quantify detection and recognition
performance.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Evaluation</title>
        <p>4.2.1. Text detection</p>
        <sec id="sec-3-2-1">
          <title>To assess the efectiveness of our framework in detecting</title>
          <p>horizontal and long text, we compare its performance
with state-of-the-art text detection methods on the
ICDAR2013 and MSRA-TD500 datasets. On the ICDAR2013
benchmark, our detector outperforms other methods by
4.2.2. Text Recognition</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>The segmentation results demonstrate an impressive</title>
          <p>94% accuracy when training our LSTM model on the
Chars74K dataset. This improvement highlights the
ability of LSTM to learn long-range temporal dependencies
by utilizing both the forward and backward aspects of the
LSTM architecture. The model efectively captures the
features of both previous and future characters within
the image, enhancing segmentation of the box image into
sub-images, which are then passed to the CapsNet model.</p>
          <p>
            Experimental results show that our CapsNet model,
trained on Chars74K images, achieves a recognition rate
of 92%. This indicates that our character recognition
model significantly outperforms state-of-the-art
methods, as presented in Table 1, and achieves comparable
performance to the optimal methods.
at least 1%, except for TextFuseNet [
            <xref ref-type="bibr" rid="ref22">34</xref>
            ]. On the MSRA- Table 1
TD500 dataset, our detector achieves a precision of 89.5%, Recognition rate comparison of state-of-the-art methods on
improving upon the SRPN+VGGDet [55] method, which the Chars74K dataset
has a precision of 87.3%. This improvement demonstrates State-of-the-art methods Recognition rate [%]
the superiority of our framework in detecting long scene AlexNet [
            <xref ref-type="bibr" rid="ref35">59</xref>
            ] 77.77
text using a single fully connected network. We also GoogleNet [
            <xref ref-type="bibr" rid="ref35">59</xref>
            ] 88.89
validate the performance of our detector on multilingual Multiscale HoG Features [
            <xref ref-type="bibr" rid="ref36">60</xref>
            ] 80
text detection using the ICDAR2017-MLT dataset. Except ConvNet [
            <xref ref-type="bibr" rid="ref36">60</xref>
            ] 71.69
for DB-ResNet-50 [56], our detector delivers the highest DCNN [
            <xref ref-type="bibr" rid="ref37">61</xref>
            ] 90.32
precision, confirming that our Yolov5-based framework Proposed CapsNet architecture 92
efectively handles the diverse text shapes across
diferent languages. For multi-oriented text detection on the Our results also demonstrate CapsNet’s ability to
hanICDAR2015 dataset, our method achieves an F-measure dle a wide variety of character shapes and its robustness
of 55.7% and precision of 76%. Compared to one-stage when dealing with datasets containing a larger number
methods such as SegLink [
            <xref ref-type="bibr" rid="ref33">57</xref>
            ], EAST [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], TextBoxes++ of classes (70 classes). Table 2 presents the accuracy,
[
            <xref ref-type="bibr" rid="ref30">42</xref>
            ], and RRD [
            <xref ref-type="bibr" rid="ref34">58</xref>
            ], our precision is 7.3%, 11.2%, and 9.6% recall, and F1-score for a selection of characters from
lower, respectively, but 2.9% higher than SegLink. This the Chars74K dataset. This significant improvement in
indicates that while our detector does not surpass oth- performance is attributed to the complexity of the
Primaers in precision for multi-oriented text, it still performs ryCaps layers, which, by utilizing vectors during
traincompetitively. Additionally, the use of multi-branch de- ing, increase the model’s capacity to represent character
tection improves detection accuracy. By generating fea- information and efectively capture various character
ture maps of three diferent sizes ( 18 × 18, 36 × 36, attributes.
72 × 72) and fusing them, our detector efectively
utilizes both shallow and deep features. This enables it to Table 2
capture rich details and semantic information, enhancing Accuracy (Acc), Recall (Rec), and F1-score (F1) of Character
its ability to handle text of varying sizes. Overall, our Recognition (CapsNet)
experimental results demonstrate that the proposed text
detector achieves comparable performance to state-of- Metric 0 2 9 I P Y x y ?
the-art methods. It efectively detects horizontal, long, Acc [%] 78 99 99 89 91 92 80 96 97
multilingual, and multi-oriented text in natural images, Rec [%] 83 99 98 78 91 96 83 91 100
as illustrated in Figure 5. Despite the varying styles of F1 [%] 81 99 98 83 91 94 81 93 99
images, the results highlight the detector’s ability to
accurately identify text with diverse shapes, orientations,
sizes, and languages.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>In this paper, we have presented a novel end-to-end
system for extracting text from natural scene image. We
introduced robust detector which can suitably localize
and extracts the region where text is existing and this has
an appreciable increase in accuracy while recognizing
the texts. The proposed detector is resistant to
backgrounds complexities and is insensitive to noise, scale
change, variation of font and languages. Moreover, a
modular Latin text recognition method is proposed to
accurately recognize text in diferent situation. We
additionally employed, in this work, CapsNet with dynamic
routing for recognition of detected text. After devising
the text detected to sub-images of individual characters
using specific segmentation method based on BI-LSTM
network; CapsNet is leveraged to diverse characters into
tens of categories. Furthermore, we proposed semantic
method as post processing step to improve the
performance and the accuracy of the system in the full word
recognition.</p>
      <p>Experimental results on diferent popular text
spotting benchmarks, including both regular and irregular
datasets, prove that our proposed model can significantly
outperform state-of-the-art methods in terms of
detection and recognition with its eficiency and high accuracy.
In future work, this system will be tested in Chinese or
other languages. Future work will look also at improving
our model to deal with the problems of false positives and
partially detected text lines especially those belonging to
arbitrarily-oriented and curved textual regions.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Declaration on Generative AI</title>
      <sec id="sec-5-1">
        <title>During the preparation of this work, the authors used</title>
        <p>ChatGPT, Grammarly in order to: Grammar and spelling
check, Paraphrase and reword. After using this
tool/service, the authors reviewed and edited the content as
needed and take full responsibility for the publication’s
content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Guettala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sayah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kahloul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <article-title>Real time human detection by unmanned aerial vehicles</article-title>
          , in: 2022
          <source>International Symposium on iNnovative Informatics of Biskra (ISNIB)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Djedi</surname>
          </string-name>
          ,
          <article-title>Gene regulatory network to control and simulate virtual creature's locomotion (</article-title>
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Boutarfaia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <article-title>Deep learning for eeg-based motor imagery classification: Towards enhanced human-machine interaction and assistive robotics</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3695</volume>
          ,
          <year>2023</year>
          , p.
          <fpage>68</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gallotta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Iocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Unsupervised pose estimation by means of an innovative vision transformer</article-title>
          ,
          <source>in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)</source>
          , volume
          <volume>13589</volume>
          LNAI,
          <year>2023</year>
          , p.
          <fpage>3</fpage>
          -
          <lpage>20</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -23480-
          <issue>4</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. GUETTALA</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <article-title>Eficient one-stage deep learning for text detection in scene images</article-title>
          , Electrotehnica, Electronica,
          <source>Automatica (EEA) 72</source>
          (
          <year>2024</year>
          )
          <fpage>65</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Amine</surname>
          </string-name>
          ,
          <article-title>An end-to-end trainable capsule network for image-based character recognition and its application to video subtitle recognition</article-title>
          .,
          <source>ICTACT Journal on Image &amp; Video Processing</source>
          <volume>11</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonanno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gagliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Optimal management of various renewable energy sources by a new forecasting method</article-title>
          ,
          <source>in: SPEEDAM 2012 - 21st International Symposium on Power Electronics, Electrical Drives, Automation and Motion</source>
          ,
          <year>2012</year>
          , p.
          <fpage>934</fpage>
          -
          <lpage>940</lpage>
          . doi:
          <volume>10</volume>
          .1109/SPEEDAM.
          <year>2012</year>
          .
          <volume>6264603</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonanno</surname>
          </string-name>
          , G. Capizzi,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pappalardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          ,
          <article-title>A cascade neural network architecture investigating surface plasmon polaritons propagation for thin metals in openmp</article-title>
          ,
          <source>in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)</source>
          , vol
          <article-title>- tection and recognition</article-title>
          ,
          <source>Archives of computational ume 8467 LNAI</source>
          ,
          <year>2014</year>
          , p.
          <fpage>22</fpage>
          -
          <lpage>33</lpage>
          . doi:
          <volume>10</volume>
          .1007/ methods in engineering 27 (
          <year>2020</year>
          )
          <fpage>433</fpage>
          -
          <lpage>454</lpage>
          .
          <fpage>978</fpage>
          -3-
          <fpage>319</fpage>
          -07173-
          <issue>2</issue>
          _
          <fpage>3</fpage>
          . [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brisinello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Grbić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vranješ</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vranješ</surname>
          </string-name>
          , Re-
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.-F.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hou</surname>
          </string-name>
          , C.-L. Liu,
          <article-title>Text localization in view on text detection methods on scene images, natural scene images based on conditional random in: international symposium ELMAR</article-title>
          , IEEE,
          <year>2019</year>
          , ifeld, in: 10th international conference on docu- pp.
          <fpage>51</fpage>
          -
          <lpage>56</lpage>
          .
          <article-title>ment analysis and recognition</article-title>
          , IEEE,
          <year>2009</year>
          , pp.
          <fpage>6</fpage>
          -
          <lpage>10</lpage>
          . [22]
          <string-name>
            <given-names>B.</given-names>
            <surname>Nail</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Atoussi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Akrour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khamar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tiber- C. Napoli</surname>
          </string-name>
          ,
          <article-title>Real-time synchronisation of multiple macine, A. Rabehi, Comparative analysis of svm fractional-order chaotic systems: an application and cnn classifiers for eeg signal classification in study in secure communication, Fractal and Fracresponse to diferent auditory stimuli</article-title>
          ,
          <source>in: 2024 tional 8</source>
          (
          <year>2024</year>
          ) 104. International Conference on Telecommunications [23]
          <string-name>
            <given-names>K. I.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Texture-based approach and Intelligent Systems (ICTIS)</article-title>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
          <article-title>for text detection in images using support vector</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , W. Liu,
          <article-title>Textboxes: machines and continuously adaptive mean shift A fast text detector with a single deep neural net- algorithm, IEEE Transactions on Pattern Analysis work</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on and Machine Intelligence</source>
          <volume>25</volume>
          (
          <year>2003</year>
          )
          <fpage>1631</fpage>
          -
          <lpage>1639</lpage>
          . artificial intelligence, volume
          <volume>31</volume>
          ,
          <year>2017</year>
          . [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <article-title>Scene text localization and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <article-title>Detect- recognition with oriented stroke detection, in: Proing text in natural image with connectionist text ceedings of the ieee international conference on proposal network</article-title>
          ,
          <source>in: European conference on computer vision</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>104</lpage>
          . computer vision, Springer,
          <year>2016</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>72</lpage>
          . [25]
          <string-name>
            <given-names>B.</given-names>
            <surname>Nail</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Djaidir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          , C. Napoli,
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Haidour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Abdelaziz</surname>
          </string-name>
          ,
          <article-title>Gas turbine vibration Textsnake: A flexible representation for detecting monitoring based on real data and neuro-fuzzy systext of arbitrary shapes</article-title>
          ,
          <source>in: Proceedings of the tem, Diagnostyka</source>
          <volume>25</volume>
          (
          <year>2024</year>
          ).
          <article-title>European conference on computer vision</article-title>
          (ECCV), [26]
          <string-name>
            <surname>X.-C. Yin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
          </string-name>
          , H.-W. Hao, Robust text
          <year>2018</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>36</lpage>
          .
          <article-title>detection in natural scene images</article-title>
          , IEEE transac-
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , W. He,
          <source>tions on pattern analysis and machine intelligence J. Liang, East: an eficient and accurate scene text 36</source>
          (
          <year>2013</year>
          )
          <fpage>970</fpage>
          -
          <lpage>983</lpage>
          . detector, in: Proceedings of the IEEE conference [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Tibermacine, on
          <source>Computer Vision and Pattern Recognition</source>
          ,
          <year>2017</year>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chebana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nahili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Starczewscki</surname>
          </string-name>
          , C. Napoli, pp.
          <fpage>5551</fpage>
          -
          <lpage>5560</lpage>
          .
          <article-title>Analyzing eeg patterns in young adults exposed to</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          ,
          <article-title>Detecting oriented text diferent acrophobia levels: a vr study, Frontiers in natural images by linking segments</article-title>
          , in: Proceed- in
          <source>Human Neuroscience</source>
          <volume>18</volume>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .3389/ ings of the IEEE conference
          <article-title>on computer vision</article-title>
          and fnhum.
          <year>2024</year>
          .
          <volume>1348154</volume>
          . pattern recognition,
          <year>2017</year>
          , pp.
          <fpage>2550</fpage>
          -
          <lpage>2558</lpage>
          . [28]
          <string-name>
            <given-names>I.</given-names>
            <surname>Naidji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Guettala</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. E</surname>
          </string-name>
          . Tiber-
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <article-title>Robust scene macine</article-title>
          , et al.,
          <article-title>Semi-mind controlled robots based text recognition with automatic rectification, in: on reinforcement learning for indoor application</article-title>
          .,
          <source>Proceedings of the IEEE conference on computer in: ICYRIME</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>59</lpage>
          . vision and pattern recognition,
          <year>2016</year>
          , pp.
          <fpage>4168</fpage>
          -
          <lpage>4176</lpage>
          . [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Faster</surname>
          </string-name>
          r-cnn:
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jaderberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Zisser- Towards real-time object detection with region proman, Synthetic data and artificial neural networks posal networks, IEEE transactions on pattern analfor natural scene text recognition</article-title>
          ,
          <source>arXiv preprint ysis and machine intelligence</source>
          <volume>39</volume>
          (
          <year>2016</year>
          )
          <fpage>1137</fpage>
          -
          <lpage>1149</lpage>
          . arXiv:
          <volume>1406</volume>
          .2227 (
          <year>2014</year>
          ). [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bouchelaghem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Balsi</surname>
          </string-name>
          , M. Mo-
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          , M. Zouai, roni, C. Napoli,
          <article-title>Cross-domain machine learning A. Rabehi, Eeg classification using contrastive learn- approaches using hyperspectral imaging for plasing and riemannian tangent space representations, tics litter detection</article-title>
          , in: 2024 IEEE Mediterranean in: 2024 International Conference on Telecommuni- and
          <string-name>
            <surname>Middle-East Geoscience</surname>
          </string-name>
          and
          <article-title>Remote Sensing cations and Intelligent Systems (ICTIS)</article-title>
          , IEEE,
          <year>2024</year>
          , Symposium (M2GARSS), IEEE,
          <year>2024</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>40</lpage>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . [31]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Djedi</surname>
          </string-name>
          , Neat neural networks to P. Fu,
          <string-name>
            <surname>Z. Luo,</surname>
          </string-name>
          <article-title>R 2 cnn: Rotational region cnn for control and simulate virtual creature's locomotion, arbitrarily-oriented scene text detection</article-title>
          ,
          <source>in: 24th in: 2014 International Conference on Multimedia International conference on pattern recognition Computing and Systems (ICMCS)</source>
          , IEEE,
          <year>2014</year>
          , pp.
          <source>(ICPR)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>3610</fpage>
          -
          <lpage>3615</lpage>
          .
          <fpage>9</fpage>
          -
          <lpage>14</lpage>
          . [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          , W. Shao,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Review of scene text de- X. Xue,
          <article-title>Arbitrary-oriented scene text detection via rotation proposals</article-title>
          , IEEE transactions on multime- [45]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          , W. Liu, Strokelets: A learned dia
          <volume>20</volume>
          (
          <year>2018</year>
          )
          <fpage>3111</fpage>
          -
          <lpage>3122</lpage>
          . multi
          <article-title>-scale representation for scene text recogni-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ladjal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bechouat</surname>
          </string-name>
          , M. Se- tion, in: Proceedings of the IEEE conference on draoui, C. Napoli,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rabehi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lalmi</surname>
          </string-name>
          ,
          <article-title>Hybrid mod- computer vision</article-title>
          and pattern recognition,
          <year>2014</year>
          , pp.
          <article-title>els for direct normal irradiance forecasting: A case 4042-4049. study of ghardaia zone (algeria</article-title>
          ),
          <source>Natural Hazards</source>
          [46]
          <string-name>
            <surname>C.-Y. Wang</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          , Y.
          <string-name>
            <surname>-H. Wu</surname>
          </string-name>
          , P.-Y. Chen, J.-W.
          <volume>120</volume>
          (
          <year>2024</year>
          )
          <fpage>14703</fpage>
          -
          <lpage>14725</lpage>
          . Hsieh,
          <string-name>
            <given-names>I.-H.</given-names>
            <surname>Yeh</surname>
          </string-name>
          ,
          <article-title>Cspnet: A new backbone that can</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Du</surname>
          </string-name>
          , Textfusenet:
          <article-title>Scene enhance learning capability of cnn, in: Proceedings text detection with richer fused features</article-title>
          .,
          <source>in: IJCAI, of the IEEE/CVF conference on computer vision</source>
          and volume
          <volume>20</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>516</fpage>
          -
          <lpage>522</lpage>
          . pattern recognition workshops,
          <year>2020</year>
          , pp.
          <fpage>390</fpage>
          -
          <lpage>391</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>eddine Boukredine</surname>
          </string-name>
          , E. Mehallel,
          <string-name>
            <given-names>A.</given-names>
            <surname>Boualleg</surname>
          </string-name>
          , [47]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jia</surname>
          </string-name>
          , Path aggregation
          <string-name>
            <given-names>O.</given-names>
            <surname>Baitiche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rabehi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guermoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Douara</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. E.</surname>
          </string-name>
          <article-title>network for instance segmentation</article-title>
          , in: Proceedings Tibermacine,
          <article-title>Enhanced performance of microstrip of the IEEE conference on computer vision and antenna arrays through concave modifications and pattern recognition</article-title>
          ,
          <year>2018</year>
          , pp.
          <fpage>8759</fpage>
          -
          <lpage>8768</lpage>
          .
          <article-title>cut-corner techniques</article-title>
          ,
          <source>ITEGAM-JETIA</source>
          <volume>11</volume>
          (
          <year>2025</year>
          ) [48]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lcvenshtcin</surname>
          </string-name>
          ,
          <article-title>Binary coors capable or 'correct65-71. ing deletions, insertions, and reversals</article-title>
          , in: Soviet
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Redmon,</surname>
          </string-name>
          <article-title>Yolov3: An incremental Physics-Doklady</article-title>
          , volume
          <volume>10</volume>
          ,
          <year>1966</year>
          , pp.
          <fpage>707</fpage>
          -
          <lpage>710</lpage>
          . improvement, in: Computer vision and pattern [49]
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          , L. Han,
          <article-title>Distance weighted cosine similarrecognition</article-title>
          , volume
          <volume>1804</volume>
          , Springer Berlin/Heidel- ity
          <article-title>measure for text classification</article-title>
          ,
          <source>in: Intelligent berg, Germany</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
          <string-name>
            <given-names>Data</given-names>
            <surname>Engineering</surname>
          </string-name>
          and Automated Learning-IDEAL,
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Puglisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          , I. Tiber- Springer,
          <year>2013</year>
          , pp.
          <fpage>611</fpage>
          -
          <lpage>618</lpage>
          . macine, et al., Exploiting robots as healthcare re- [50]
          <string-name>
            <given-names>D.</given-names>
            <surname>Karatzas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Shafait</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Uchida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iwamura</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. G.</surname>
          </string-name>
          <article-title>sources for epidemics management and support i Bigorda</article-title>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Mestre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Mota</surname>
          </string-name>
          , J. A. caregivers,
          <source>in: CEUR Workshop Proceedings</source>
          , vol- Almazan,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>De Las Heras</surname>
          </string-name>
          ,
          <source>Icdar 2013 robust ume 3686</source>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . reading competition, in: 2013 12th international
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , En- conference
          <article-title>on document analysis and recognition, hancing eeg signal reconstruction in cross-</article-title>
          <string-name>
            <surname>domain</surname>
            <given-names>IEEE</given-names>
          </string-name>
          ,
          <year>2013</year>
          , pp.
          <fpage>1484</fpage>
          -
          <lpage>1493</lpage>
          .
          <article-title>adaptation using cyclegan</article-title>
          , in: 2024 International [51]
          <string-name>
            <given-names>D.</given-names>
            <surname>Karatzas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gomez-Bigorda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nicolaou</surname>
          </string-name>
          , Conference on Telecommunications and
          <string-name>
            <surname>Intelligent S. Ghosh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bagdanov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Iwamura</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Matas</surname>
          </string-name>
          ,
          <source>Systems (ICTIS)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . L.
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>V. R.</given-names>
          </string-name>
          <string-name>
            <surname>Chandrasekhar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
          </string-name>
          , et al.,
          <source>Icdar</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Anguelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reed</surname>
          </string-name>
          ,
          <year>2015</year>
          competition on robust reading, in: 2015 13th C.
          <article-title>-</article-title>
          <string-name>
            <surname>Y. Fu</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Berg</surname>
          </string-name>
          , Ssd: Single shot multibox de- international
          <source>conference on document analysis and tector</source>
          , in: Computer Vision-ECCV
          <year>2016</year>
          :
          <article-title>14th Eu- recognition (ICDAR)</article-title>
          , IEEE,
          <year>2015</year>
          , pp.
          <fpage>1156</fpage>
          -
          <lpage>1160</lpage>
          . ropean Conference, Amsterdam, The Netherlands, [52]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tu</surname>
          </string-name>
          , Detecting texts Springer,
          <year>2016</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>37</lpage>
          .
          <article-title>of arbitrary orientations in natural images</article-title>
          ,
          <source>in: 2012</source>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Synthetic data IEEE conference on computer vision and pattern for text localisation in natural images</article-title>
          , in: Proceed- recognition, IEEE,
          <year>2012</year>
          , pp.
          <fpage>1083</fpage>
          -
          <lpage>1090</lpage>
          .
          <article-title>ings of the IEEE conference on computer vision</article-title>
          and [53]
          <string-name>
            <given-names>N.</given-names>
            <surname>Nayef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yin</surname>
          </string-name>
          , I. Bizid,
          <string-name>
            <given-names>H.</given-names>
            <surname>Choi</surname>
          </string-name>
          , Y. Feng, pattern recognition,
          <year>2016</year>
          , pp.
          <fpage>2315</fpage>
          -
          <lpage>2324</lpage>
          . D.
          <string-name>
            <surname>Karatzas</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Pal</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Rigaud</surname>
          </string-name>
          , J. Chazalon,
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          , Sin- et al.,
          <article-title>Icdar2017 robust reading challenge on multigle shot text detector with regional attention, in: lingual scene text detection and script identificationProceedings of the IEEE international conference rrc-mlt</article-title>
          ,
          <source>in: 14th IAPR international conference on on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3047</fpage>
          -
          <lpage>3055</lpage>
          .
          <article-title>document analysis and recognition (ICDAR)</article-title>
          , vol-
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>M.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          , Textboxes++
          <article-title>: A single-shot ume 1</article-title>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>1454</fpage>
          -
          <lpage>1459</lpage>
          .
          <article-title>oriented scene text detector</article-title>
          , IEEE transactions on [54]
          <string-name>
            <surname>T. E. de Campos</surname>
            ,
            <given-names>B. R.</given-names>
          </string-name>
          <string-name>
            <surname>Babu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Varma</surname>
          </string-name>
          ,
          <source>Character image processing 27</source>
          (
          <year>2018</year>
          )
          <fpage>3676</fpage>
          -
          <lpage>3690</lpage>
          .
          <article-title>recognition in natural images</article-title>
          , in: International con-
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <article-title>An end-to-end trainable neu- ference on computer vision theory and applications, ral network for image-based sequence recognition volume 1</article-title>
          , SCITEPRESS,
          <year>2009</year>
          , pp.
          <fpage>273</fpage>
          -
          <lpage>280</lpage>
          .
          <article-title>and its application to scene text recognition</article-title>
          , IEEE [55]
          <string-name>
            <given-names>W.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-M. Ogier</surname>
          </string-name>
          , C.
          <article-title>- transactions on pattern analysis and machine intel- L. Liu, Realtime multi-scale scene text detection ligence 39 (</article-title>
          <year>2016</year>
          )
          <fpage>2298</fpage>
          -
          <lpage>2304</lpage>
          .
          <article-title>with scale-based region proposal network</article-title>
          , Pattern
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>R.</given-names>
            <surname>Minetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Thome</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Leite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stolfi</surname>
          </string-name>
          ,
          <source>T- Recognition</source>
          <volume>98</volume>
          (
          <year>2020</year>
          )
          <article-title>107026. hog: An efective gradient-based descriptor for sin-</article-title>
          [56]
          <string-name>
            <given-names>M.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <article-title>Real-time gle line text regions</article-title>
          ,
          <source>Pattern recognition 46</source>
          (
          <year>2013</year>
          )
          <article-title>scene text detection with diferentiable binarization</article-title>
          ,
          <fpage>1078</fpage>
          -
          <lpage>1090</lpage>
          . in
          <source>: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>34</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>11474</fpage>
          -
          <lpage>11481</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [57]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Yuille</surname>
          </string-name>
          ,
          <article-title>Detecting and reading text in natural scenes</article-title>
          ,
          <source>in: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
          , volume
          <volume>2</volume>
          , IEEE,
          <year>2004</year>
          , pp.
          <article-title>II-II.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [58]
          <string-name>
            <given-names>M.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          , G.-s. Xia,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <article-title>Rotationsensitive regression for oriented scene text detection</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>5909</fpage>
          -
          <lpage>5918</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [59]
          <string-name>
            <given-names>M.</given-names>
            <surname>Soomro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Farooq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Raza</surname>
          </string-name>
          ,
          <article-title>Performance evaluation of advanced deep learning architectures for ofline handwritten character recognition</article-title>
          ,
          <source>in: 2017 International Conference on Frontiers of Information Technology (FIT)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>362</fpage>
          -
          <lpage>367</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [60]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Newell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Grifin</surname>
          </string-name>
          ,
          <article-title>Multiscale histogram of oriented gradient descriptors for robust character recognition, in: International conference on document analysis and recognition</article-title>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>1085</fpage>
          -
          <lpage>1089</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [61]
          <string-name>
            <given-names>S.</given-names>
            <surname>Arivazhagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rathina</surname>
          </string-name>
          ,
          <article-title>Recognition of handwritten characters using deep convolution neural network</article-title>
          .,
          <source>Journal of the National Science Foundation of Sri Lanka</source>
          <volume>49</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>