Segmentation of analogue meter readings using neural networks
Vadym Slyusar 1, Ihor Sliusar 2, Nataliia Bihun 1, and Volodymyr Piliuhin 2
1
  Central Research Institute of Armaments and Military Equipment of Armed Forces of Ukraine, Povitrophlotsky
Av., 28B, Kyiv, 03049, Ukraine
2
  Poltava State Agrarian University, str. G. Skovorody, 1/3, Poltava, 36003, Ukraine


                Abstract
                The report discusses options for solving the image segmentation problem of displaying digital
                indicators of analogue water or gas meters using neural networks. The results of a comparative
                analysis of the application of various implementation options for neural networks based on
                PSP, U-Net, and U-Net2 are presented. The Water Meters Dataset, which is freely available on
                the Kaggle website, was used as a dataset. In this case, the analysis was carried out by
                comparing various parameters of the learning process, as well as the value of the accuracy
                indicator on the validation sample. Its maximum value was reached at the level of 86.5% when
                using the PSPBlock2D neural network and 88.8% - on the light version of U-Net.

                Keywords 1
                Neural Network, segmentation, U-Net, PSP

1. Introduction
   As you know, one of the constraints to the implementation of the concepts of Smart Home, Smart
City, Industry 4.0, IoT, Agriculture 4.0, etc. is the need to integrate analog energy metering, for
example, water, gas, and electricity meters. Often their replacement with digital devices is not cost-
effective [1]. This may be due to the ban on making changes to communications, the high cost of
developing design and technical documentation, a large number of analogue accounting tools at the
enterprise, etc.
   One option to overcome this barrier could be a combination of artificial intelligence and Internet of
Things (AI + IoT) technologies. At the same time, an optical channel for digitizing readings based on
the recognition operation is very often used to transform analogue meters into digital ones. Solutions
[2] and [3] should be mentioned as an example of such an approach. However, they have significant
drawbacks that affect mass adoption. Here it is necessary to indicate high requirements for the spatial
stability of the image and exposure parameters, the lack of unification by types of counters and fonts,
and the influence of vibrations with certain specifics of technological production processes. In addition,
the use of edge computing has to adjust to limited computing resources, e.g. low-resolution images
(typically around 28x28), the need for manual segmentation, accurate initial set-up, and manual
correction of reading data.
   One of the options for solving this problem can be the use of an image segmentation procedure
before recognition. The technical implementation of this approach is possible due to fog computing
technologies.
   Today, to perform segmentation, several approaches are used, considered, for example, in [4, 5].


MoMLeT+DS 2022: 4th International Workshop on Modern Machine Learning Technologies and Data Science, November, 25-26, 2022,
Leiden-Lviv, The Netherlands-Ukraine.
EMAIL: swadim@ukr.net (V. Slyusar); islyusar2007@ukr.net (I. Sliusar); bigun0717@ukr.net (N. Bihin); vovi202020@gmail.com
(V. Piliuhin);
ORCID: 0000-0002-2912-3149 (V. Slyusar); 0000-0003-1197-5666 (I. Sliusar); 0000-0003-3327-5521 (N. Bihin); 0000-0001-6113-0843
(V. Piliuhin)
             ©️ 2022 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
2. The aim of the research
   The aim of the work is a comparative analysis of the accuracy of possible options for solving the
problem of semantic segmentation of images of digital displays of counters based on neural networks.

3. The Main Results of the Study
   As you know, the key point in the application of neural networks is the choice of dataset. In the field
of counters segmentation, this problem is made easier by the fact that the corresponding dataset is
publicly available on the Kaggle the relevant dataset is publicly available on the Kaggle website [6].
   Since a common option for solving the problem of semantic image segmentation is the use of the
PSP neural network [7], we will consider this approach as a starting point for research, using the
architecture of the so-called large PSP, shown in Figure 1. Since the indicated structure of the neural
network assumes a 16-fold reduction in the data matrix, the image format used for learning should be
completely divisible by 16. For this reason, the size of the original images in the Water Meters dataset
[6] of 1000x1778 pixels was preliminarily recompressed into sizes that are a multiple of 16.


Figure 1: 4-channel PSP

    As a first step in solving the learning problem, the 128x224 pixel format was chosen as the closest
in proportion to the original photos. In particular, for an image with a frame side of 128 pixels, a
recalculation with a factor of 1778/1000 gives a result of 227.584. In this case, rounding to 224 pixels
should be almost invisible. Alternatively, with twice the length of the shorter side of the frame of 256
pixels, multiplying by 1778/1000 gives 455.168. In this variant, the closest multiples of 16 would be
464 or 448 pixels. Similarly, at 240 pixels we get:
                                          240(1778/1000) = 426.72,
with the corresponding nearest multiple of 16 being 432.
    It should be noted that the choice of one or another format of the recompressed image, in addition
to the maximum possibility of maintaining the proportions of the original images of the dataset, should
also take into account the existing limitations of the computing resources on which the neural network
is trained. To maximise these resources, the study used the graphical mapping capabilities of Google's
ColabPro+ service in learning.
    The learning process of the PSP neural network was performed with a learning step of 0.001 and
batch 16 since the installation of batch 32 was accompanied by an insufficient resource error. The
learning dataset contained 870 images, and the validation dataset contained 374 images. The masks
used for image segmentation were black and white. The percentage of space occupied by the black
background was 98%, while the white cutout for the digital display accounted for 2%. The run time for
200 learning epochs in standard Google ColabPro+ connection mode with a V100 graphics card
equipped with 16 GB of RAM was 32 minutes. At the same time, the maximum learning accuracy on
the original large PSP reached 70% at the 73rd epoch. Continuing learning to 400 epochs made it
possible to achieve an accuracy of 75.2% at the 356th epoch.
    Next, we explored the option of modifying the original PSP architecture by replacing the
Conv2DTranspose layers with UpSampling and MaxPooling with AveragePooling (Figure 2). The
obtained results of the validation of such a neural network after learning to allow us to conclude that
the modification of the original PSP architecture with Conv2DTranspose layers on the described dataset
is trained worse than the modification with PSP with UpSampling layers. In particular, the modified
version made it possible to obtain an average class accuracy of 81.1% already at the 48th epoch.


Figure 2: Modification of the original PSP neural network

    Even greater improvements in accuracy were achieved by using an alternative to Figure 2
modification of the PSP-network of Figure 1, which consisted in increasing the number of convolutional
layers with the ReLu activation function in each of the channels up to 8. The size of their cores remained
the same (3x3), fixed for all 4 channels (Pooling branches) also remained and the number of convolution
kernels is 16. In this case, batch normalization layers were used in each channel and, additionally, a
Dropout layer with a data thinning factor of 0.1 was applied at the channel output. This structure has
been given the conventional name PSPBlock2D (Figure 3).
        As a result of learning for 160 epochs, the accuracy of the class with the worst segmentation
quality on the test sample reached 73%, and the average class accuracy reached 86.2%. In Figure 4
illustrates the learning process described, and Figure 5 – the segmentation quality on the validation set.
Figure 3: PSPBlock2D neural network structure


Figure 4: PSPBlock2D neural network learning results

   At the next stage, the studies were carried out using a neural network of the U-Net type [8 - 12].
   In Figure 6 shows the architecture of the light version of U-Net. The relative simplicity of its
architecture made it possible to switch to the original 432x240 pixel dataset and easily carry out long-
term learning for 622 epochs with a batch of 16 and a final learning step of 0.00001. The calculation
time of one epoch fluctuated within 27 - 28 sec. Already at the 54th epoch, an accuracy of 88.4% was
achieved, and then it took more than 400 epochs for the maximum achieved accuracy to stabilize at the
level of 88.8% at the 464th epoch. Not only the architecture of the neural network contributed to the
improvement in accuracy, but also the larger image format during learning.
   This can be confirmed by the results of using the more complex structure of the so-called medium
U-Net, schematically shown in Figure 7, with a learning image format of 224x128 pixels. At the same
time, an accuracy of 87.8% was achieved at the 49th epoch with batch 32.
   The architecture of this neural network included 5 serially connected base blocks CB (Figure 8) in
the descending branch and 4 base blocks IB (Figure 9) in the ascending branch.
Figure 5: PSPBlock2D neural network learning results


Figure 6: Lite version of U-Net

   At the same time, inside each base block of both branches, the same type of Conv2D and
Conv2Dtranspose convolutions with ReLu activation functions, respectively, were used. However, the
number of convolution cores increased from 16 to 256 multiples of degree 2 in the descending branch
and decreased in the opposite order, from 128 to 16, in the ascending branch when moving from one
base unit to another.
   In the MaxPool2D layers the Pool size = 2,2. The Quantity of filters in Conv2D layers of CB_m
blocks is K=23+m, m=1, …, 5, the Kernel size is 3x3 and strides – 1x1, Padding = same, Activation
function is ReLu.
Figure 7: Modified version of U-Net medium architecture.


Figure 8: Typical downstream middle U-Net building block (CB)

   The Quantity of filters in Conv2DTranspose and Conv2D layers of IB_r blocks is L=28-r, r=1, …, 4.
The Kernel size of Conv2D is 3x3 and strides – 1x1, Padding = same, Activation function is ReLu. The
Kernel size of Conv2DTranspose is 2x2 and strides – 2x2.
   We also studied a neural network variant similar in structure, called the large U-Net, which differed
by increasing the number of convolution cores in descending blocks by the sequence 64, 128, 256, 512,
1024 and changing them in reverse order in the ascending branch. Contrary to expectations, such a
manoeuvre with the parameters of the architecture did not improve the accuracy, which was limited to
the level of 87% at the 54th epoch with the same batch sizes (32) and learning step.
   A further complication of the architecture was carried out by switching to a neural network of the
U-Net++ type (Figure 10). In the course of computational experiments, it was found that this neural
network works with batch 4, but not as efficiently as a large and medium U-Net. As you might expect,
batch 8 at 0.001 gives better accuracy than batch 4. Also, U-Net++ works on batch 16, but much worse
at 0.001. The resulting accuracy for different batch sizes is presented in Table. 1.


Figure 9: Typical IB building block of the ascending branch of the middle U-Net


Figure 10: Architecture “U-Net++” from the Terra AI framework

   The so-called U-Net2 [13, 14] was considered as the maximum complexity of the neural network
architecture (Figure 11).
   In this case, a 240 x 432 pixel image dataset made it possible to work with batches 4, 8 and 16. The
maximum accuracy with batch 16 and a learning step of 0.001 was 86% at the 18th epoch, and with
batch 8 it was 88.5% at the same epoch 18. Thus, the U-Net2 neural network demonstrated more
intensive learning.

Table 1
The resulting accuracy of U-Net++ for different batch sizes
              Batch                          Accuracy                               Epoch
                8                               85.2                                 33
                16                              71.5                                 18
                32                              83.8                                 76


Figure 11: A general view on the architecture of “U-Net2” from the Terra AI framework.

    Since in batch 32, during learning with 240 x 432 images, an insufficient resource error occurs, with
this batch, a transition was made to a smaller image format of 128 x 224 pixels. At the same time, an
accuracy of 85.8% was obtained at the 71st epoch.
    A comparison of the architectures of all considered neural networks is presented in Table. 2. As you
can see, a larger architecture does not necessarily give a better result.

Table 2
The comparison of used neural networks
      Architecture         Total parameters            Trainable parameters          Non-trainable
                                                                                      parameters
         PSPBlock2D               34,429,058                 34,426,498                  2,560
         U-Net++                  2,084,370                  2,081,042                   3,328
          U-Net2                   682,290                    678,706                    3,584
     PSP on Figure 1               923,266                      923,266                    0
     PSP on Figure 2               574,158                    574,152                      6
      Large U-Net                 31,060,226                 31,046,530                   13,696
     Medium U-Net                 1,948,226                  1,944,802                   3,424
       Light U-Net                  1,869,826                1,866,882                     2,944
   The hardware implementation can be based on the Raspberry Pi Zero processor board, the neural
network on the ESP32, and other solutions (Figure 12) proposed in [15]. Alternatively, you can use the
ESP32-CAM module (Figure 13) [16], which is currently the most cost-effective option for
implementing edge IoT (Figure 14).


Figure 12: Raspberry Pi Zero 2W and ESP32


Figure 13: ESP32-CAM


Figure 14: Options for mounting it on meters
4. Perspectives sing of Further Research
    Given the relevance of edge computing, it is advisable to explore the possibility of implementing
segmentation and recognition based on TensorFlow Lite on devices such as ESP32-CAM.
    The approach considered in the paper can be used not only in the interest of creating digital
infrastructure. A promising direction is the development of uncrewed platforms that use vehicles
initially oriented towards human control. At the same time, to read the indicators of their sensors, for
example, a speedometer, engine speed, oil pressure, etc. video cameras with neural networks can be
used, similar to the options discussed here for household meters.
    The authors plan to continue further research on the possibility of using neural networks based on
the use of Object Detection technology with marking digital displays using Bounding Boxes, using the
results, for example, [17]. In addition, it is also of interest to generalize the approach considered here
to the case of pointer analog devices and the use of pre-trained neural networks for image classification
in the structure of a neural network.

5. Conclusion
    The presence of outdated energy accounting equipment in the infrastructure makes it impossible to
fully realize integration with the IoT ecosystem. Consequently, the transition to Industry 4.0can be very
complicated and choosing the right solution path at the design stage will play a key role in the future.
    The use of optical recognition of analog meter readings ensures minimal interference in the existing
production process, and most importantly, without stopping it or stopping its monitoring. Therefore,
such solutions are quite popular.
    To eliminate edge restrictions computing data processing model in the IoT ecosystem should be
based on fog computing. In this case, it becomes possible to perform an image segmentation procedure
before recognition, including based on neural networks, the architectures of which are very complex
for edge computing.
    The paper considers an approach based on the modification of PSP or U-Net and U-Net2. To evaluate
the synthesized architectures, the value of the accuracy index on the validation set was used. Its
maximum value is 88.8% when using a lightweight U-Net neural network and a learning image format
of 224x128 pixels.
    The proposed solutions can be used for other AI + IoT applications.

6. References
[1] Gaz-counter. URL: https://github.com/maleficxp/gaz-counter.
[2] AI-on-the-edge-device. URL: https://github.com/jomjol/AI-on-the-edge-device.
[3] Analog      meters     in    the   digital  enterprise:    change     or    integrate?    URL:
    https://habr.com/ru/company/lanit/blog/676240/.
[4] F. Yang, Q. Sun, H. Jin and Z. Zhou, Superpixel segmentation with fully convolutional networks,
    in: Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, 2020, pp. 13964-
    13973.
[5] V. Slyusar, M. Protsenko, A. Chernukha, V. Melkin, O. Petrova, M. Kravtsov, S. Velma, N.
    Kosenko, O. Sydorenko and M. Sobol, Improving a neural network model for semantic
    segmentation of images of monitored objects in aerial photographs, Eastern-European Journal of
    Enterprise Technologies, vol. 2, no. 6 (114), 2021, pp. 86-95. doi:10.15587/1729-
    4061.2021.248390.
[6] R. Kucev, Water Meters Dataset. Hot and cold water meters dataset. URL:
    https://www.kaggle.com/datasets/tapakah68/yandextoloka-water-meters-dataset.
[7] H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, Pyramid Scene Parsing Network. URL:
    https://arxiv.org/abs/1612.01105.
[8] O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image
    Segmentation. URL: https://arxiv.org/pdf/1505.04597.pdf.
[9] W. Jwaid, Z. Al-Husseini and A. Sabry, Development of brain tumor segmentation of magnetic
     resonance imaging (MRI) using U-Net deep learning, Eastern-European Journal of Enterprise
     Technologies, vol. 4, no. 9 (112), 2021, pp. 23-31. doi:10.15587/1729-4061.2021.238957.
[10] N. Singh and K. Nongmeikapam, Semantic segmentation of satellite images using deep-UNet,
     Arabian Journal for Science and Engineering, 2022, pp. 1-13.
[11] A. Soni, R. Koner, and V. Villuri, M-Unet: Modified U-Net segmentation framework with satellite
     imagery, in: Proceedings of the Global AI Congress 2019, Springer, 2020, pp. 47-59.
[12] E. Irwansyah, Y. Heryadi, and A. Gunawan, Semantic image segmentation for building detection
     in urban area with aerial photograph image using U-Net models, in: Proceedings of the 2020 IEEE
     Asia-Pacific Conf. on Geoscience, Electronics and Remote Sensing Technology (AGERS), 2020,
     pp. 48-51.
[13] X. Qin,; Z. Zhang, C. Huang, M. Dehgan, O. Zaiane and M. Jagersand, U2-Net: Going deeper with
     nested U-structure for salient object detection. pattern recognition. 2020, 106, 107404.
[14] F. Ge, G. Wang, G. He, D. Zhou, R. Yin and L. Tong, A Hierarchical Information Extraction
     Method for Large-Scale Centralized Photovoltaic Power Plants Based on Multi-Source Remote
     Sensing Images. URL: https://www.mdpi.com/2072-4292/14/17/4211.
[15] H. Padmasiri, J. Shashirangana, D. Meedeniya, O. Rana and C. Perera, Automated License Plate
     Recognition for Resource-Constrained Environments. URL: https://www.mdpi.com/1424-
     8220/22/4/1434/htm.
[16] ESP32-CAM. URL: https://www.espressif.com/en/news/ESP32_CAM.
[17] V. Slyusar, M. Protsenko, A. Chernukha, S. Gornostal, S. Rudakov, S. Shevchenko, O. Chernikov,
     N. Kolpachenko, V. Timofeyev and R. Artiukh, Construction of an advanced method for
     recognizing monitored objects by a convolutional neural network using a discrete wavelet
     transform, Eastern-European Journal of Enterprise Technologies, vol. 4, no. 9 (112), 2021, pp. 65-
     77. doi:10.15587/1729-4061.2021.238601.