<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Character Detection Algorithm Based on Yolov5 1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Changhao Lao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weiping Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuge Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiujing Fan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Electronic Engineering, Guangxi Normal University</institution>
          ,
          <addr-line>Guilin Guangxi</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>122</fpage>
      <lpage>126</lpage>
      <abstract>
        <p>Dot jet printing code on product packaging has been a difficult problem in industrial inspection due to its complex background, diverse characters and changeable fonts. An improved YOLOV5 algorithm is proposed to detect the complex background of goods. By adding segmentation and decoding tasks to the output layer, the precise location of the inkjet region is achieved, and the inkjet code is effectively separated from the complex background to obtain a pure inkjet region, which provides a simple task for subsequent recognition. At the same time, SE attention mechanism is used to make the network pay more attention to the extraction of point character features, so as to improve the accuracy of ink-jet code location and segmentation. Finally, the improved algorithm is combined with character recognition CRNN. Through the actual measurement on the production line of the food packaging factory, the character positioning accuracy is 98.7%, and the recognition rate is 98.5%, with good robustness.</p>
      </abstract>
      <kwd-group>
        <kwd>Yolov5 algorithm Target detection Segmentation decoder Attention mechanism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>application value for the detection of ink-jet characters, and can be practically applied to the deployment
of industrial production lines.</p>
    </sec>
    <sec id="sec-2">
      <title>2.Improved Yolov5 Network</title>
      <p>Yolov5 has excellent performance in terms of precision and speed on open source datasets. However,
when applied to the commodity inkjet code inspection task, due to the diversity of commodities and the
complexity and diversity of background colors, the inkjet code is submerged in the dark background,
making it difficult to identify the text. To solve the problem that text cannot be recognized due to
complex background interference, this paper realizes the location of ink-jet region based on Yolov5,
and separates the ink-jet characters from the complex background to obtain pure ink-jet characters.
2.1. output layer is improved to two specific decoders</p>
    </sec>
    <sec id="sec-3">
      <title>2.1.1. Character positioning decoder</title>
      <p>The character positioning decoder continues the anchor based multi-scale detection scheme adopted
by Yolov5. First, we use the path structure aggregation network (PAN), which is a bottom-up feature
pyramid network. FPN transfers semantic features from top to bottom, combines them to obtain better
feature fusion effect, and then directly uses multi-scale fusion feature map in PAN for detection. Finally,
three previous anchors with different aspect ratios will be assigned to each grid of the multi-scale feature
map. The detector head will predict the position offset, height and width scaling, as well as the
corresponding probability and prediction confidence of each category.</p>
    </sec>
    <sec id="sec-4">
      <title>2.1.2. Character Splitter Decoder</title>
      <p>
        The character segmentation decoder classifies the picture pixel by pixel and judges that the pixel
belongs to an inkjet character or background [
        <xref ref-type="bibr" rid="ref7">6</xref>
        ]. The backbone network is used to extract features
together with the character location task, and the bottom layer of the FPN is fed to the segmentation
branch with a size of (W/8, H/8256). After three times of up sampling, the segmentation branch restores
the output feature mapping to the size of (W, H, 2), which represents the probability of each pixel in the
input image to the inkjet character and background. Since the down sampling will lead to the loss of
information in the feature map , the feature map and the shallow dimensional feature map will be used
for channel splicing before three times of up sampling to recover the lost features.
      </p>
    </sec>
    <sec id="sec-5">
      <title>2.2. Attention mechanism</title>
      <p>Because the task of this paper is to detect and identify the characters that occupy several regions in
the image, while the background information occupies a large part of the image, when the image is
convoluted many times, the extracted information is mostly useless interfering information, and the
noise information formed will cover the characteristic information of the spray, resulting in a poor
segmentation effect. To this end, we added the se attention mechanism to the network, which enables
the network to improve its sensitivity to inkjet informative features by significantly modeling the
interdependencies between the evolving feature channels of the network, and ultimately enables the
network to better segment the inkjet.
Feature
extraction
CSP-Darkenet</p>
      <p>Attention</p>
      <p>input
Inception</p>
      <p>Scale</p>
      <p>output
Feature
fusion
FPN</p>
      <p>PAN</p>
      <p>Global poolding
1 * 1 * C
1 * 1 * C/r
1 * 1 * C/r
1 * 1 * C
1 * 1 * C
H * W * C</p>
      <p>FC
ReLU</p>
      <p>FC</p>
      <p>Sigmoid
H * W * C
Prediction</p>
      <p>layer
Positioning
prediction</p>
      <p>Split
prediction</p>
      <p>As shown in Figure 1, the SE module is mainly divided into three steps: the first step is to collect
the global average value based on the width and height of the input feature map to reduce the dimension
of the spatial feature to 1 * 1. The second part uses the full connection layer to establish the connection
between channels. Finally, the sigmoid activation function obtains the normalized weight. The
normalized weight is weighted to each channel of the original feature map channel by multiplication to
complete the re calibration of the original features concerned by the channel. The improved network
structure is shown in Figure 2.</p>
      <p>Post
processing</p>
      <p>NMS</p>
      <p>Extract predicted
coordinate
information</p>
      <p>Retain
the split
picture</p>
    </sec>
    <sec id="sec-6">
      <title>3. Experiment and result analysis</title>
    </sec>
    <sec id="sec-7">
      <title>3.1. Data Set Introduction</title>
      <p>The data set used in the experiment is the food packaging box provided by Dejunwang Food Co.,
Ltd. in Jinjiling, Guilin. There are 12 categories, 200 pictures in each category, and 2400 pictures in
total. This paper selects 2000 pieces as the training set, and expands the text detection data set to 4000
pieces through data enhancement.</p>
      <p>The processor used in the experiment is AMD Ryzen 5 3500X, the graphics card is NVIDIA GTX
2070s, and the operating system is Windows10, 64 bit. The whole experiment uses python 1.8+python
3.7 to build the network model.</p>
    </sec>
    <sec id="sec-8">
      <title>3.2. model training</title>
      <p>Train and test the improved Yolov5 text detection network and CRNN text recognition network. The
text detection network experiment optimization algorithm uses the Adam optimizer. The initial learning
rate is set to 1E-4, the batch size is 8, and the epoch is 100.</p>
    </sec>
    <sec id="sec-9">
      <title>3.3. Experimental results and analysis</title>
      <p>In order to verify the improved YOLOV5 inkjet text detection method in this paper, experiments
were conducted on 500 test datasets to evaluate the overall performance of combining CRNN with
improved YOLOV5 network text detection. The evaluation index of text recognition rate is the accuracy
rate by comparing the predicted text with the real text. If four characters correctly predict three
characters, the accuracy of text prediction sequence is 75%.</p>
      <p>Table 1 shows the ablation experiment results of the improved Yolov5 text detection algorithm used
to separate the position and background of the inkjet code. In this paper, a segmented decoder and an
improved loss function are added to YOLOV5 network to enable it to locate the ink-jet characters and
separate the ink-jet characters from the background. The SE attention module is added, and the
recognition accuracy is improved by 4.9 percentage points.</p>
      <p>The left figure of Figure 3 shows the image to be detected of the input network, and the right figure
of Figure 3 shows the output result of the improved Yolov5 algorithm network.</p>
      <p>In order to verify the effectiveness of the improved Yolov5 algorithm in inkjet detection of complex
backgrounds, several common character detection schemes are compared, and the results are shown in
Table 2. The common methods do not remove the background interference, resulting in insufficient
final recognition accuracy. In this paper, the improved Yolov5 algorithm improves the positioning
accuracy by 2.8%, and the improved algorithm realizes the segmentation of inkjet code and background
at the same time. The algorithm obtains pure inkjet code area, and the recognition rate is significantly
improved.
Table 2 Comparison of Common Character Algorithms</p>
      <p>check
Network</p>
      <p>accuracy
ABCNet ——
FOTS ——
CTPN+CRNN 95.8%
Yolov5+CRNN 95.9%
Improvem Network 98.7%
speed/
(fps)
5.0
14.0
4.5
32.0
16.0</p>
      <p>After comprehensive consideration, the improved Yolov5 algorithm effectively combines the tasks
of semantic segmentation and object detection to obtain an ink-jet region without background.
Compared with the code spraying area obtained by unmodified YOLOV5, the character recognition
accuracy using CRNN network is significantly improved. The final test results show that the character
location accuracy and recognition rate are 98.7% and 98.5% respectively.</p>
    </sec>
    <sec id="sec-10">
      <title>4.Conclusions</title>
      <p>In order to solve the problem of ink-jet detection of packaging boxes in food production, this paper
proposes an improved algorithm based on Yolov5 ink-jet character segmentation detection, adds
semantic segmentation decoder to the output layer of the network, and realizes the combination of
semantic segmentation and target detection tasks. At the same time, combined with the attention
mechanism, enhance the extraction of dot characters and ink dot features, effectively improve the
segmentation of ink-jet characters and background, and locate the ink-jet region. The clean character
area obtained by the improved Yolov5 algorithm can greatly improve the character recognition rate.
This algorithm can effectively detect the ink-jet area on the food packaging box. Next, we can try to
carry out more effective feature extraction for small targets, so as to optimize the segmentation between
inkjet characters and background, obtain more pure character regions, and further improve the
recognition rate of characters.</p>
      <p>Fund project: Guangxi Postgraduate Education Innovation
Project (XYCSZ2021002); Chinese National Natural Science</p>
      <p>Foundation (61966004)
6. References</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>Division precision 96</source>
          .0%
          <issue>93</issue>
          .5%
          <issue>92</issue>
          .1%
          <issue>87</issue>
          .6%
          <issue>98</issue>
          .5%
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Wu</given-names>
            <surname>Huiying</surname>
          </string-name>
          , Chen Ming, Fan Yanjun, Huang Shuai.
          <article-title>Research on lattice character recognition in complex background [J]</article-title>
          .
          <source>Computer Applications and Software</source>
          ,
          <year>2021</year>
          ,
          <volume>38</volume>
          (
          <issue>09</issue>
          ):
          <fpage>146</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Wang</surname>
            <given-names>JX</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>ZY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian</surname>
            <given-names>X</given-names>
          </string-name>
          .
          <article-title>Review of natural scene text detection and recognition based on deep learning</article-title>
          .
          <source>Ruan Jian Xue Bao/Journal of Software</source>
          ,
          <year>2020</year>
          ,
          <volume>31</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1465</fpage>
          −
          <lpage>1496</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Qian X Q，Liu W F，Zhang</surname>
            <given-names>J</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cao</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Underwater-relevant image object detection based featuredegraded enhancement meth-od．</article-title>
          <source>Journal of Image and Graphics</source>
          ，
          <volume>27</volume>
          (
          <issue>11</issue>
          ) :
          <fpage>3185</fpage>
          -
          <lpage>3198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Zhou</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>C H</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chen</surname>
            <given-names>W.</given-names>
          </string-name>
          <year>2021</year>
          .
          <article-title>Region-level channel attention for single image superresolution combining high frequency loss</article-title>
          .
          <source>Jour-nal of Image and Graphics</source>
          ,
          <volume>26</volume>
          (
          <issue>12</issue>
          ):
          <fpage>2836</fpage>
          -
          <lpage>2847</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Duan</surname>
            <given-names>B</given-names>
          </string-name>
          ，
          <string-name>
            <surname>Fu</surname>
            <given-names>X</given-names>
          </string-name>
          ，
          <string-name>
            <surname>Jiang</surname>
            <given-names>Y</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zeng J X．</surname>
          </string-name>
          <article-title>2020． Lightweight blurred car plate recognition method combined with generated images． Journal of Image</article-title>
          and Graphics，
          <volume>25</volume>
          (
          <issue>09</issue>
          ) :
          <fpage>1813</fpage>
          -
          <lpage>1824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Bi</surname>
            <given-names>XL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>WS</given-names>
          </string-name>
          .
          <article-title>Pancreas Segmentation Based on Dual-decoding UNet</article-title>
          .
          <source>Ruan Jian Xue Bao/Journal of Software</source>
          ,
          <year>2022</year>
          ,
          <volume>33</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1947</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>