Tunnel cable defect detection method based on deep
                                learning and Global-Local features
                                Yang Zhao1, Yingqiang Shang1, Tian Guo1, Zixi Han2, ∗, Zhaoyi Zhang2, Jirong Fu2 and
                                Yongrui Zhang2

                                1 State Grid Beijing Electric Power Company, China

                                2 Institute of Next Generation Power Systems and International Standards, Wuhan University, China


                                               Abstract
                                               A tunnel cable fault detection method based on deep learning and global-local features is
                                               proposed to meet the requirements of tunnel cable inspection and improve the safety of cable
                                               operation. First, a method based on color uniqueness and compactness is provided to highlight
                                               the salient foreground regions in the inspection image. Then, according to the relationship
                                               between the global spatial scenario distribution and local information, a global context model and
                                               a local fine detection model are established to deeply and accurately compute the salient features
                                               of the image. A cyclic structure network is used to weigh the position of each feature map. Next,
                                               a cyclic connection is then established by feeding the output of each block model back to the input.
                                               Repeated iteration and noise filtering reduce the influence of background information. Finally,
                                               the advantages of the proposed scheme in terms of training time and recognition accuracy were
                                               verified by comparing it with four classical target recognition algorithms in the case study.

                                               Keywords
                                               tunnel cable inspection; deep learning; local fine feature


                                1. Introduction
                                With the gradual improvement of urban underground distribution networks, the
                                construction and maintenance of cable tunnels play an increasingly important role in the
                                development process of new power systems. However, the natural geographical
                                environment of cable tunnels leads to problems such as water seepage, moisture, cable
                                aging, and equipment corrosion and damage [1,2]. At present, the maintenance of cables
                                and their supporting equipment in tunnels mainly relies on manual work. The large amount
                                of thick smoke, harmful gases, high-voltage cable leakage, and cable tunnel collapse may
                                pose a threat to the safety of workers.
                                   Owing to the rapid development of robotics and image recognition technology, using
                                inspection robots to detect the internal situation of cable tunnels has gradually become an

                                ICCIC 2024: International Conference on Computer and Intelligent Control, June 29–30, 2024, Kuala Lumpur,
                                Malaysia
                                ∗ Corresponding author.

                                   15101691629 @163.com (Y. Zhao); 819900413@qq.com (Y. Shang); guotian124com@163.com (T. Guo);
                                2018302070047@whu.edu.cn (Z. Han); zhangzhaoyi@whu.edu.cn (Z. Zhang); 2020302191016@whu.edu.cn (J. Fu);
                                2020302191606@whu.edu.cn (Y. Zhang)
                                    0009-0007-7289-5498 (Z. Han)
                                              © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
effective solution. In recent years, many domestic and foreign companies and research
institutions have developed tunnel inspection robots for different application fields. The
water tunnel robot developed in [3] can effectively repair the inner wall of the tunnel,
improving construction efficiency and work safety. Ref. [4] developed an underground cable
detection robot called "Patrol", which can enter the underground and crawl along the cable
to find the fault location. At present, most researchers focus on improving the structure of
tunnel robots and researching their path planning and obstacle avoidance, with relatively
little research on the intelligent functions of robots. Specifically, under the complex and
humid internal environment of cable tunnels, a device fault identification scheme for cable
tunnel inspection robots not only has significant significance for the construction and
maintenance of the tunnel but also plays a good role in ensuring the safety of maintenance
personnel inside the tunnel.
    In recent years, deep learning has been increasingly widely applied in the field of image
recognition. Ref. [5] trains two deep neural networks to calculate pixel features and global
recommendations respectively, and comprehensively judges the salient regions of the
image. Ref. [6] proposes dense and sparse reconstruction (DSR), which first uses a
significance calculation model for reconstruction errors, calculates the entropy and sparsity
of each region of the image to reconstruct errors, and uses the K-Means algorithm to obtain
the significance of reconstruction error clustering. Ref. [7] obtained more accurate saliency
map inference by combining prior knowledge of saliency. Ref. [8] proposed a multi-task
deep learning algorithm that applies the Laplacian nonlinear model to a significantly
enhanced regression model, which is more commonly used in semantic segmentation and
multi-object task detection. Ref. [9] proposes a bidirectional learning framework to
aggregate multi-level convolutional features of salient object detection (SOD) models and
improve the robustness and accuracy of saliency detection.
    Based on the above research, this paper proposes a cable tunnel defect detection method
based on deep learning and global-local features, fully considering the relationship between
local fine features and global spatial scene distribution, to improve the accuracy of salient
feature map detection. A cyclic structure network module was proposed to continuously
collect contextual information and iteratively improve convolutional features. The
proposed method can effectively identify three common faults of tunnel cable insulation air
gaps, insulation scratches, and insulation surface impurities. The implementation of the
proposed tunnel cable defect detection method provides a scalable intelligent application
model for tunnel inspection robot systems, thus laying the foundation for the further
development and implementation of other related applications.

2. Composition of robot system
Power cables are increasingly favored by governments and power departments in urban
power lines, therefore, the application of laying power cables as a power supply method in
power construction is becoming more and more common. But high-voltage power cables
are installed in underground passages (pipe galleries or tunnels), and cable tunnels refer to
corridors or tunnel like structures used to accommodate a large number of cables, which
can provide sufficient protection for the cables. China's cable tunnel technology began in the
1980s and has undergone decades of development. Currently, complex urban cable tunnel
networks are built in major cities in China. Many cable tunnels face problems such as
collapse, seepage, water accumulation, fire, cable smoldering, aging and rusting of power
supporting facilities, animal invasion, and accumulation of toxic and combustible gases on
the front. To ensure the safety of cables inside the tunnel and maintain the stable operation
of the entire power grid, it is necessary for staff to regularly inspect the cable tunnel.
However, due to the long history of construction and the harsh environment inside old-
fashioned cable tunnels, inspectors not only need to face the harsh environment of high
temperature and humidity, but also constantly face the threat of landslides and toxic gases
during the inspection process. All of these factors pose a serious threat to the life safety of
cable tunnel inspectors. At present, there are many online monitoring devices or track type
inspection robots installed in cable tunnels. The online monitoring devices in tunnels mostly
use wired communication connections, which makes construction difficult before or during
the technical transformation process of the tunnel. For example, distributed fiber optic
temperature measurement requires laying fiber optic cables along the entire length of the
cable body, while other monitoring methods only set monitoring equipment at the
attachment, which is low in cost-effectiveness and difficult to ensure monitoring accuracy;
Orbital robots need to carry out track construction inside tunnels, which is costly, requires
a long construction period, poses a great threat to personnel in harsh environments, and is
prone to damage to tunnel structures.
    The working system of the cable tunnel inspection robot includes an algorithm
workstation, a ground station, a cable tunnel inspection robot, and a camera and sensor
mounted on it, as shown in Fig. 1. The equipment carried by the robot includes visible light
cameras, thermal imaging cameras, gas sensors, relay boards, WiFi boards, gimbals, and
lithium batteries. To eliminate the problem of limited wireless signal transmission in the
complex environment of cable tunnels, this system converts the local area network at the
bottom of the cable tunnel into high-power 5.8GHz WiFi to communicate with ground
stations.


                                LAN

             Algorithm
             workstation

                                       Antenna
              Ground station            feeder               Cable tunnel cover plate


                                                   WiFi
                                    WiFi antenna          Inspection robot
                                       Cable tunnel ground

Figure 1: Composition of cable inspection robot system.
3. Cable defect detection combining deep learning and global-local
features
This article first uses color uniqueness and color compactness to calculate the salient
foreground region of an image, to preserve the structural information of salient regions and
the contours of salient objects. Then, based on a convolutional neural network (CNN), a
saliency detection network is proposed within the same deep learning framework fully
considering the relationship between local fine features and global spatial scene
distribution. Using a module with contextual relationships to weigh the position of each
feature map, the proposed method detects local features of salient objects and improves the
salient feature map, based on the relationship between each pixel and adjacent regions.
Finally, a cyclic structure network module is provided by feeding the output of each block
back to the input to establish a cyclic connection, and iteratively filtering out background
noise to obtain salient objects.

3.1 Detection for significant foreground features
According to the basic principles of the human visual attention system, using visual cues
such as color uniqueness and color compactness can better perform saliency detection.
Color uniqueness, as a measure of contrast, mainly describes the differences in color and
brightness between different regions. Color compactness, as a spatial distribution feature,
typically results in a compact spatial distribution of salient objects, while the background
area is more widely distributed throughout the entire image.
   This article combines color uniqueness and color compactness to calculate foreground
salient features. Firstly, given image I, a unique mapping SU is obtained based on center
priors through positional information. The compactness feature can be decomposed into
images using a Gaussian mixture model (GMM), and then the saliency map SC can be
obtained through feature clustering calculation. Unlike the method of selecting the best
function from uniqueness [10,11], this paper uses a simple multiplication of SC and SU to
obtain a composite salient foreground region.

                                S FG= SC ⋅ S U                                           (1)
3.2 Global Context Detection Model
Firstly, the 256×256×3 sized image after superpixel segmentation is taken as input, where
the three dimensions represent width, height, and channel, respectively. The algorithm in
this article consists of 5 convolutional layers and 2 fully connected layers. Conv represents
the convolutional layer, the pool represents the pooling layer, and fc represents the fully
connected layer. The conv layer and the fc layer are composed of linear transformations, and
the relu layer has a non-linear correction unit function. Only the conv layer and the fc layer
have learnable parameters. For the transformation layer, the size of the feature map is
defined as width×height×depth, where the first two dimensions describe the spatial size
and depth refers to the number of channels. The last layer of the network structure has two
neurons, which use the Softmax function as the output to represent the probability that the
detected superpixels belong to the background or belong to salient objects.

3.3 Local fine detection model
The upper branch calculates the approximate salient region based on global contextual
relationships, while the lower branch is used to detect local details of salient objects. The
local fine detection model adopts inputs similar to the global context detection model, which
are then normalized to 227×227×3 sized images. The upper and lower branch models share
the same deep structure and have independent parameters [12]. By estimating the
significance probability score, the output window is predicted. The calculation formula is:

                            Sscore ( xgc , =
                                           xlc ) P=  (
                                                   y 1 xgc , xlc ;θ1                        )
                                                                                            (2)
   where xgc and xlc represent the output of the global context detection model and the
second to last layer of the local fine detection model. y represents the significance prediction
of superpixels, where y=1 when superpixels are salient regions and y=0 when superpixels
are background.
   This article uses the minimum result and the maximum loss between segmentation
labels in the last network layer to classify salient and background features, as shown in (3).


       (
    L θ ; { xgc
             i
                , xlc
                   i
                      , y}
                                  m

                                  i =1   )=
                                          −
                                            1
                                            m
                                              ∑ {y  i∈{1,, m}
                                                                                  (i )
                                                                                            j} log P ( y ( i ) )
                                                                                            =
                                                      j∈{0,1}
                                                                                                                   (3)

                                                     j xgci , xlci , θ j
                                            y (i ) = ∣
   The parameters in the algorithm can be decomposed into several parts. In Eq. (2),
θj={wgc,j, wlc,j, α, β}. wgc,j and wlc,j represent the last layer parameters of the global context
and local fine detection model, respectively. α, β is the proportion of global and local
modeling weights. Label probability is calculated via (4).

                 =     j xgci , xlci , θ j ) ∞µ ( xgc , xlc , θ jµ ) ,
                  P( y ∣
                                                                                                                   (4)
                 =θ jλ w=         µ
                        gc, j , θ j {wgc, j , wlc, j ,α , β } , j ∈{0,1}
   where μ represents the significant probability of modeling the global background and
                                                     (                   )
                              (              )
                                                         T                 T
                                                    f α wgc , j xgc + β ⋅ wlc , j xlc
local environment, i.e., µ xgc , xlc , θ jµ ∞e                                          .
   The corresponding non-standardized significance prediction score function is
represented as:

                  f ( xgc , xlc , θ jµ ) =T
                                         wgc,1 xgc + f u (α wgc,
                                                             T
                                                                 j xgc + β ) wlc, j xlc
                                                                              T
                                                                                                                   (5)
                      t , 0 ≤ t ≤ 1
   where f u ( t ) =                . f u (α wgc, j xgc + β ) is the range of saliency detection
                                               T

                       0, otherwise
results obtained from multiple global context detection models. The range of α wgc, j xgc + β
                                                                                                        T


                                   (
in (5) is (−∞,0]∪[1,+∞]. f u α wgc, j xgc + β
                                T
                                                                   ) has high confidence in saliency in the global
model. It is assumed that in the local fine detection model, Eq. (6) contains non-zero
processing.

                                       f u (α wgc,
                                               T
                                                   j xgc + β ) wlc, j xlc
                                                                T
                                                                                                             (6)
3.4 Circular network design
CNN can detect from low-level visual features to high-level features, so the simple
combination of convolutional features can allow noise to propagate to the prediction layer
without limitation, or may lead to the loss of some information during the information
propagation process. Convolutional recurrent structures can better explore and process
environmental information [13,14].
   This article proposes a reweighting mechanism based on the initial architecture to adjust
the transmitted features, using a cyclic structure to perceive situational information and
connecting the output of each block to the input. By using the same cyclic connection and
sharing weights multiple times in each layer, the new architecture can increase the depth of
traditional CNN layers without significantly increasing the total parameters. Connect the
calculated salient region to the first input module in the loop structure, and the
downsampling layer after the feature map f k is generated by the k-th block. A convolutional
layer of kernel size m is used to slide on the local feature space window of m×m. Normalize
the L2 layer and finally connect the activation layer to form features. The detailed steps are
as follows.
   Firstly, to calculate the weighted mapping map Mk, convolution operations need to be
performed on the subsequent output channels, where W represents the kernel, and b
represents the bias parameter. Each value in the obtained weighted mapping determines
the importance of each spatial position.
   Then, the Softmax operation is applied spatially to Mk to obtain the final weighted
mapping via (7).

                                                         exp ( M k ( x, y ) )
                             λ ( x, y ) =
                               k

                                                    ∑ exp ( M ( x , y ) )
                                                                        k   ′   ′                            (7)
                                                (    ′
                                                    x ,y   ′
                                                               )
    where λk(x,y) represents the normalized response value at (x,y). k is the number of blocks.
If pixel i is significant at position (x,y), then assign higher significance values to the relevant
regions of that pixel in the weighted map.
    Finally, the weighted mapping is upsampled to obtain the features ft k ( c=      ) λVk × ft k−1 ( c )
where c represents the number of feature channels. λVk is shared across all channels in ft k .
4. Simulation analysis
To verify the advantages of the proposed scheme in terms of training speed and recognition
accuracy, four classic object detection and recognition algorithms, i.e., the classic CNN
algorithm [15], SVM algorithm [16], BPNN algorithm [17], and ELM algorithm [18], are
provided for comparison. These methods are respectively applied to defect recognition of
internal equipment in cable tunnels in a city in China. The main comparative indicators of
the simulation include network training time and algorithm recognition accuracy.

4.1 Network training time
Train the images collected in cable tunnels using 5 different algorithms mentioned above.
To minimize the impact of network parameters on the training speed of the five methods,
the small batch gradient descent method was used during the training process, and the
learning rate was uniformly set to 0.1. Meanwhile, the number and resolution of images in
all training datasets remain consistent. Specifically, due to the instability during the training
process, which may result in a deviation in the required time for each training session, the
average of 10 training sessions for 5 algorithms is taken as the final result. The specific
training time comparison is shown in Fig. 2.


Figure 2: Comparison of network training time for 5 methods

   As shown in Fig. 2, the proposed algorithm has a significant reduction in network
training time compared to CNN, SVM, and BPNN algorithms. This is because the
convolutional neural network has obtained ideal weights and biases through pre-training,
making it easier for the network to converge during secondary training. The ELM method
has the shortest training time because it only needs to update the weights between the
hidden layer and the output layer. The entire training process is the process of solving the
MP generalized inverse matrix, which is easier to implement compared to the network
training process in other methods.

4.2 Algorithm recognition accuracy
Using 5 algorithms to identify defects in 100 images collected in cable tunnels, the
recognition results include 4 possibilities, i.e., defects in the image that were correctly
identified, defects in the image that were not detected, defects in the image that were not
detected, defects in the image that were not detected, and defects in the image that were not
detected but were mistakenly identified. The first two situations indicate accurate
recognition, while the latter two situations indicate the presence of misidentification. The
mathematical expression for recognition accuracy is the quotient of the number of correctly
recognized images and the total number of images, as shown in Table 1.
   It is worth noting that due to the uncertainty of all five methods mentioned above, the
results of identification accuracy in Table 1 are the average of 20 experiments. Based on the
statistical data in the table, the five methods have higher recognition accuracy for insulation
scratches, and the specific reason may be related to the collected internal images of the cable
tunnel itself. Because the calibrated training and testing image dataset contains more cable
equipment images, the characteristics of cable insulation scratch images are easier to detect
and collect. In addition, compared to the four traditional recognition algorithms, the
algorithm proposed in this paper shows higher recognition accuracy. This is because the
method fully considers the relationship between local fine features and global spatial scene
distribution, achieving accurate calculation of significant image features, and thus achieving
higher recognition accuracy in the final cable tunnel defect recognition application.
   In summary, although the proposed algorithm is not as fast as the ELM algorithm in
network training, its recognition accuracy is the highest. Moreover, after the network
training is completed, the network can be encapsulated, and the subsequent recognition
process can be carried out through the trained network encapsulation. Therefore, the speed
of network training has little impact on the subsequent recognition process.

Table 1
Comparison of recognition accuracy of 5 algorithms
                                              Recognition accuracy/%
         Fault Type         proposed
                                            CNN         SVM          BPNN         ELM
                             method
     Insulation air gap       95.6          94.7         91.7        93.2         88.2
        Scratches on
                               97.3         95.5         92.3        94.3         87.3
         insulation
     Insulation surface
                               94.8         92.6         88.9        89.7         85.9
         impurities
           Total               95.9         94.3         91.0        92.4         87.1
5. Conclusion
This article proposes a cable tunnel defect detection method based on deep learning and
global-local features, which fully considers the relationship between local fine features and
global spatial scene distribution, and improves the accuracy of salient feature map detection.
A cyclic structure network module was proposed to continuously collect context
information and iteratively improve convolutional features. Compared to the classic
recognition algorithms, the proposed method outperforms CNN, SVM, and BPNN in terms
of network training time. Compared to ELM algorithm, although the network training speed
of the proposed method is lower than that of ELM, the final recognition rate (95.9%) is
higher than ELM algorithm (87.1%). In addition, the trained network can be encapsulated
and directly used for defect recognition in cable tunnel inspection robots, greatly improving
the engineering applicability.

Acknowledgements
This work was supported by the self funded project "Key Technology Research on
Collaborative Inspection of Quadruped Robots in Power Tunnels" by State Grid Beijing
Electric Power Company.


References
[1] W. Ding, S. Yuan, X. Gao, et al., “Research on construction disturbance characteristics
    caused by super large diameter pipe jacking in electric power tunnel,” Rock and Soil
    Mechanics, vol. 31, no. 9, pp. 2901–2906, 2010.
[2] J. Li, X. Huang, X. Yang, “Numerical simulation and settlement prediction of subway
    locating under cable tunnels,” Journal of Railway Science and Engineering, vol. 5, no. 1,
    pp. 68–71, 2008.
[3] S. J. Julier, J. K. Uhimann, “Unscented filtering and nonlinear estimation,” Proceedings
    of the IEEE, vol. 92, no. 3, pp. 401–422, 2004.
[4] B. Jiang, A. P. Sample, R. M. Wistort, et al., “Autonomous robotic monitoring of
    underground cable systems,” International Conference on Icar, IEEE, pp. 673-679, 2005.
[5] M. H. Yang, R. Xiang, H. Lu, et al., “Deep networks for saliency detection via local
    estimation and global search,” Proceedings of the IEEE Conference on Computer Vision,
    Los Alamitos: IEEE Computer Society Press, pp. 3183-3192, 2015.
[6] X. H. Li, H. H. Lu, L. H. Zhang, et al., “Saliency detection via dense and sparse
    reconstruction,” Proceedings of the IEEE International on Conference on Computer
    Vision, Los Alamitos: IEEE Computer Society Press, pp. 2976-2983, 2013.
[7] L. Z. Wang, L. J. Wang, H. C. Lu, et al., “Saliency detection with recurrent fully
    convolutional networks,” Proceedings of the European Conference on Computer Vision,
    Heidelberg: Springer, pp. 825-841, 2016.
[8] X. Li, L. M. Zhao, L. N. Wei, et al., “Deep Saliency: multi-task deep neural network model
    for salient object detection,” IEEE Transactions on Image Processing, vol. 25, no. 8, pp.
    3919-3930, 2016.
[9] P. Zhang, D. Wang, H. Lu, et al., “Amulet: aggregating multi-level convolutional features
     for salient object detection,” Proceedings of the IEEE International Conference on
     Computer Vision, Los Alamitos: IEEE Computer Society Press, vol. 2017, pp. 202-211,
     2017.
[10] X. Chen, H. Ma, X. Wang, et al., “Improving object proposals with multi-thresholding
     straddling expansion,” Proceedings of the IEEE Conference on Computer Vision and
     Pattern Recognition, Los Alamitos: IEEE Computer Society Press, pp. 2587-2595, 2015.
[11] R. Achanta, S. Hemami, F. Estrada, et al., “Frequency-tuned salient region detection,”
     Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los
     Alamitos: IEEE Computer Society Press, vol. 22, no. 9/10, pp. 1597-1604, 2009.
[12] D. W. Zhang, H. Z. Fu, J. W. Han, et al., “A review of co-saliency detection algorithms:
     fundamentals, applications, and challenges,” ACM Transactions on Intelligent Systems
     and Technology, vol. 9, no. 4, pp. 1-31, 2018.
[13] X. Wang, H. M. Ma, X. Z. Chen, et al., “Edge preserving and multi- Scale contextual neural
     network for salient object detection,” IEEE Transactions on Image Processing, vol. 27,
     no. 1, pp. 121-134, 2017.
[14] W. Zhang, Y. Han, Q. Huang, et al., “The fast multi-scale convolutional sparse coding
     based super-resolution for infrared image,” Journal of Computer-Aided Design &
     Computer Graphics, vol. 30, no. 10, pp. 1935- 1942, 2018.
[15] Z. H. Zhao, S. P. Yang, Z. Q. Ma, “License plate character recognition based on
     convolutional neural network LeNet-5,” Journal of System Simulation, vol. 22, no. 3, pp.
     638–641, 2010.
[16] A. Tsoupos, V. M. Khadkikar, “A novel SVM technique with enhanced output voltage
     quality for indirect matrix converters,” IEEE Transactions on Industrial Electronics, vol.
     66, no. 2, pp. 832–841, 2019.
[17] Z. Xiao, S. J. Ye, B. Zhong, et al., “BP neural network with rough set for short term load
     forecasting,” Expert Systems with Applications, vol. 36, no. 1, pp. 273–279, 2009.
[18] G. B. Huang, Q. Y. Zhu, C. K. Siew, “Extreme learning machine: theory and application,”
     Neurocomputing, vol. 70, no. 1-3, pp. 489–501, 2006.