A Joint Application of Fuzzy Logic Approximation and a Deep Learning Neural Network to Build Fish Concentration Maps Based on Sonar Data Dmitry Glukhov1 [0000-0003-4983-2919], Rykhard Bohush1 [0000-0002-6609-5810], Juho Mäkiö 2 [0000-0001-9987-7600] and Tatjana Hlukhava 1 1 Polotsk State University, Blokhina st., 29, Novopolotsk, Republic of Belarus, 211440 {d.gluhov, r.bogush, t.gluhova }@psu.by 2 University of Applied Sciences Emden/Leer, Constantiaplatz 4, Emden, Germany, D-26723 juho.maekioe@hs-emden-leer.de Abstract. This paper proposes an effective method for obtain topographic lake map with fish concentration based on the results of an intelligent sonar data processing. Fuzzy logic special implementation for approximation of sonar data is used. The mathematics apparatus of fuzzy logic provides the possibility of flexible adjustment approximator under conditions of problem to be solved when working with data of high dimensionality. An algorithm for obtaining fish concentration maps based on the results of intelligent processing of the sonar data is also proposed. The algorithm is based on the following steps: input frame separation into overlapping blocks, blocks-processing using convolu- tional neural networks YOLO v2, and merging extracted bounding boxes around one object. Experimental results for fish detection and fish concentra- tions map building are presented. Keywords: sonar data; fish concentration; maps of lakes; fuzzy logic; convolu- tional neural networks 1 Introduction Modern tools for detecting underwater objects with application of ultrasound (sonars) have become widespread in solving various applied problems. A highly specialized class of sonars designed to study the relief of the lake bottom and search for fish are called echo sounders. Currently, there is a wide range of sonars from different suppliers. The most fa- mous sonars are produced by Lowrance, Raymarine and Humminbird. Moreover, most modern sonars have a GPS module. Such devices are called chartplotters. Chart- plotters fix echogram data to X, Y coordinates of Mercator projection (WGS-84/UTM coordinate system). The peculiarity of echograms is that GPS data is updated much less frequently than ultrasonic sounding data. So, each individual act of acoustic sounding cannot be geographically fix. The modern echograms formats contain the Mercator projection coordinates, changing stepwise after GPS data update. The most common echograms formats are SLG and SL2 developed by the Lowrance company. The SL2 format is used for a multi-beam sonar equipped with a DownScan bottom scan function and a Structure- Scan structural scan (455kHz or 800kHz beam) and the ability to probe Primary and Secondary beams (at frequencies 83kHz and 200kHz) simultaneously. Now there are several software packages designed for processing sonar data: ReefMaster by ReefMaster Software Ltd., DrDepth (currently this project was pur- chased by Humminbird and on its basis created the program AutoCharts), Surfer, ArcGis, GlobalMapper and others. In addition to the high cost, the most of geographic information systems (GIS) ignore the acoustic echo information and analyze only the water depth data. This approach does not allow to create a fish concentration map or a presence of vegetation map or a large fish habitats map or other water body character- istics maps, indirectly extracted from the echolocation data. Traditionally, various methods of spatial interpolation are used to construct a to- pographic map of the bottom from a discrete set of measurements. Geostatic estima- tion methods, such as kriging, require a large amount of computation, but allow us to obtain interpolations that are optimal in a certain sense. When it comes to the processing of sonar data, it is important to note a feature that data is fragmentary, limited, and often inadequate to obtain reliable statistical esti- mates. The presence of uncertainty of this kind is an additional argument in favor of soft calculations. If we regard the unknown parameter as continuous, then we can draw a parallel between the conclusion about the value of the unknown parameter and the approximation of the function. The idea of applying a fuzzy logical approximator for constructing a bottom to- pographic map follows from an analogy. The set of depth point measurements can be considered as a system of knowledge about the properties and structure of the water body. Each acoustic sounding can be described in terms of formal logic. The deployment of the best tool for image recognition based on deep learning neu- ral networks allows us to talk about the use of the echo sounder for solving new ap- plied problems, such as tourist, ecological, nature protection, search tasks. In [1] authors propose sliding window filters with contour detection to extract low- level features and fishes contours on echo images. This approach cannot adapt to various shapes of fish-schools and bottom artifacts, because filters kernels were not specialized for complicated forms. In [2] presented approach which use sliding window filtering to extract objects from echo image. At first a median filter was used for noise removing, after a low pass filter with adaptive threshold was used to separate tracks with fishes from the background noise level, finally a perimeter filter was used to remove small regions with echo pulses from stochastic noise and bottom-structure. Described method can give false negative results in case complicated forms school of fishes. Next algorithm, presented in [3] proposes convolutional neural networks (CNN) approach to extract information about fish localization on echo image. This algorithm uses the sonar images of moving agent obtained by forward-looking sonar. Authors used CNN Yolo to binary classification. But in case high-resolution images and small-sized fishes this algorithm can give a lot of false negative results. Also, school of fishes can be missed, because it was not taken into account when learning CNN. Traditionally, to surface a topology map building by a number of discrete meas- ures, different kinds of interpolation methods are used. Geostationary estimation methods [4], like kriging, have large computational costs, but they can achieve opti- mal interpolation. In the case of echo data processing, it is important to take into ac- count that the data are limited and frequently and the proper evaluation of static char- acteristics by these values is hard. GPS data updates rarely than the echo data that is why georefrencing is performing for group of echo sounding points. These groups are not equable located in water body and that is why we perform fuzzy logic for calcula- tions. We propose novel approach to generate fish concentration maps based on sonar data using CNN and that can adapt to different environment conditions. The presented approach to detect fishes or other objects on sonar images is based on the following steps: 1) separation of the input image into overlapping blocks; 2) blocks-processing using CNN YOLO v2, and 3) merging extracted bounding boxes around one object. After fish detection, to construct maps of the distribution of features along the lake, we propose a novel method for constructing the approximation of GPS-referenced CNN results based on the original implementation of fuzzy logic. 2 Fish detection using CNN Deep machine learning systems provide perfect performance for object detection and classification challenges. Object detection systems have to dedicate following contri- butions: accuracy, precise extraction of regions of interest (RoIs) on images, and their classification with minimal deviation and speed. Usually typical image processing systems (optical character recognition system, fire detection video systems and oth- ers) include the following steps: preprocessing, features extraction, classification, and context processing [5, 6]. Machine learning systems simulating the human brain, can solve detection and classification problem as good as or even better than the human brain. At the same time, machine learning systems are faster in problem solving than the human brain. Currently, CNNs are increasingly used for image processing in vari- ous practical areas. Unlike traditional networks, CNNs provide a reduced number of extracting parameters and as an alternative of whole image processing and can proc- ess only extracted feature map, which takes into account the image topology and is stable to affine transformation. We analyzed famous neural network architectures, like AlexNet[7], Faster R-CNN [8], CoogleNet[9], ResNet[10] and etc. These networks process whole image as the feature map. This approach can make calculation faster, than the whole image proc- essing. As stated in [11], YOLO is the fastest object detection system that works bet- ter than Faster R-CNN. In [12] the authors propose YOLO v2, YOLO9000, proposing modifications of YOLO. Better segmentation and classification was achieved by: 1) batch normalization from [13]; 2) high resolution classifier; 3) convolutional with anchor boxes; 3) use of k-means; 4) direct location prediction; 5) RPN usage; 6) fine- grained features; 7) multi-scale training; and 8) novel classification model Darknet- 19. CNN Darknet-19 has 19 convolutional and five max-pooling layers. It can distin- guish 9000 classes. At training time, instead of fixing the input image size, the net- work was changed in every few iterations. After ten batches YOLO v2 randomly chooses a new image dimension size. Since this model down-sampled by a factor of 32, was pulled from the following multiples of 32: {320, 352, … , 608}. Input image resolution was resized to that dimension and continues training. Since the sonar moves during the scanning of the lake along a complex trajectory with an alternating speed, it is necessary to perform the procedure of echogram nor- malizing. For this purpose, an algorithm to convert the echogram to metric coordi- nates along the length of the sonar track was developed. Due to the corresponding stretching/compression of the echogram, all objects of the acoustic echo may be rep- resented on a single scale (fig.1). Fig. 1. Example for echogram normalization process The input images scaled before CNN processing. This means that the tiny objects (fishes) can be missed. To solve this problem, we decided to process patches of echo image for precisely tiny object detection and subsequently concatenate the output results with performing post processing actions. We propose an effective algorithm for fish detection on sonar images based on the following steps: input frame separating into overlapping blocks, blocks processing using CNN YOLO v2, merging extracted bounding boxes around one object. Input image I with sizes H×W is divided into overlapping blocks Ci,j with sizes ch×cw, i  0, H / ch  1 , j  0, W / cw  1 . Overlap size can vary by input frame reso- lution and percentage ratio of minimal objects sizes. Each block goes to CNN YOLO v2 [13] in which network predict objects are lo- calized by using sequences of convolutional filters. YOLO v2 uses convolution with anchor boxes, like in Faster R-CNN and run k-means (k=5) clustering for getting good priors for predicted objects. After YOLO v2 processing, we have bounding boxes in every block Ci,j that are presented as top left corner coordinates Bi,j(x1,y1), bottom right corner coordinates Bi,j(x2,y2), object classification, and probability value. In the next step - blocks post-processing - the neighbor RoIs, which have combined overlapped region located closer than 20% from blocks edge, are searched. If these blocks are found, we calculate IoU (Intersection over Union) which describes two regions overlapping: B1  B 2 IOU  , (1) B1  B 2 in which B1 and B2 are regions areas. 3 Maps building based on fuzzy logic The idea of “fuzzy logic approximator” application to bottom topographical map building or for building feature map is based on following analogy: The assembly of separated depth measuring may be presented as knowledge-based systems including information about body water features and structure. The echo detection is described with formal logic as: IF coordinates X, Y AND time t THEN depth D, water temperature T and other parameters. In this work the universal adaptive approximation is presented as fuzzy logic spe- cific realization mathematical tools evolved in [14]. If distance between two points measuring operator is L(p,pi) then, as analogy of production system, water body fragment knowledge proliferates on neighbor frag- ments in accordance unimodal function has maximum value in specified point. We propose membership function in distance from nodal approximation point pi to point p as: 1   p, pi   | L( p, pi )  Llimit , (2) L( p, pi ) n In case application defuzzification by COG (Center Of Gravity) method, desired feature value for unknown point p can be calculated as: p.z   ( p, p ) p .z , i i i (3)  i i In addition, we present defuzzification methods modifications which overcome neighbor point’s influence with help of space discretization. The replacement of rules group comes into discretization interval on one rule with maximum membership func- tion in point p influence. Fuzzy logic approximation, in contrast to traditional approximation methods, can take into account several predicates and build complex conditions. For example, we can formulate approximation condition which allows not only depth but also information of structure of the lake bottom for correction abrupt depth change which may arise by bottom objects (artifacts). IF coordinates X, Y AND bottom structure without artifacts, THEN depth D, wa- ter temperature T, and other parameters. We also proposed modifications to the defuzzification methods to eliminate the disproportion of the influence of nearby points. This was done by space discretizing and replacing the influence of the rule group, included in a single discretization inter- val, on the influence of one rule with the maximum membership function at point p. An interesting way to eliminate the influence of the nodes location unevenness is the angular discretization. We introduce the operator angle(p, pi) that returns the number of the circle sector into which the angle between the unknown point p and the approximation node pi falls. Define the nearest point for each sector as follows: PAangle  p, pi   pi | pj  P, pj  pi, angle  p, pj   angle  p, pi  , (4)   p, pi     p, pj  , L  p, pi   Llimit The value of the unknown parameter, for example, the depth z, for an 8-sector split:    p, PA  PA .z . 8 k k p.z  k 1 (5)    p, PA  8 k 1 k If the angular discretization interval tends to 0, then the output by the center of gravity method will look as follows:  φ  p,   z    d , π p.z  π (6)  φ  p,   d π π where (p,) is piecewise-linear interpolation in polar coordinates of the values of the maximal membership functions of the nearest points in the direction to  points, and z() is a piecewise-linear interpolation in polar coordinates of the depth values of the points with the maximum membership function value. The key difference between fuzzy logic approximation and traditional methods of approximation is the possibility of taking into account several predicates. For exam- ple, we formulate an approximation condition that should include both the depth in- formation and the bottom structure information in order to eliminate the effect of depth jumps from bottomed artifacts. In this case predicates value needs to be normalized and we propose the following membership function modification: n  L( p, pi )    p, pi    1   | L( p, pi )  Llimit . (7)  Llimit  To define depth irregularity for point p as R(p), we normalize this value and esti- mate “bottom without artifacts” model as: m  R( p )  Rmin   R  p   1   . (8)  Rmax  Rmin  Then: p.z  max  min    p, p  ,  ( p)   z dz . i R (9) max  min    p, p  ,  ( p)   dz i R In conclusion, we get a classical logical output minimax representation about un- known parameter value and bottom approximation with low artifact influence. 4 Experimental Results For Yolov2 we build our own training set including about 80 000 objects. We selected ground truth bounding boxes around RoIs manually using VOTT (Visual Object Tag- ging Tool) software [15]. VOTT can make ground truth coordinates and convert them into Yolo format. Using this program, we additionally created annotations files. We predicted five classes of objects: “fish”, “grass”, “school of fish”, “predator”, “bottom fish”. Fig. 2 depicts ground truth boxes in VOTT. a) b) c) d) e) f) g) h) i) k) l) m) n) Fig. 2. Ground truth bounding boxes in VOTT: a,b,c)”fish”; d,e) “predator”; f,g)”school of fish”; h,i)”grass”; k,l,m,n)” bottom fish” All images were taken from the river Western Dvina and lakes in the Republic of Belarus with a maximum depth for river of 12 meters and for lakes of 38 meters. Double-beam (200kHz and 450 kHz) echo sonar Lowrance HOOK 4 was used. Fig. 3 depicts the resulting classification after YOLOv2 processing. Fig. 3. Fish detection and classification results Presented algorithm has an accuracy of 72.1% and a low percentage of false posi- tive results in case of fish presence. However, our approach, as shown in Fig 4, cannot properly distinguish classes “grass” and “school of fish”, especially in case similar shapes. Fig. 4. Example for incorrect classification Therefore, the accuracy of the approximation increases with increasing number and density of approximation nodes. As the number of approximation nodes in- creases, the accuracy of approximation increases. In this case it is important that the sonar passes over all the most complex and characteristic sections of the bottom. A special type of track is the lake contour, imported in KMZ format from known GIS systems, such as Google Earth. The contour is a track with zero-depth points (Fig. 5). Similar contours are used to simulate contours of islands. It is possible to delete incorrect points of the sonar tracks, as well as some points of the contours for modeling the open contour of the river bed. The same approach is used to construct the approximation by other features (Fig. 6). Fig. 5. Example for map building taking into account the contour of the lake Fig. 6. Example for map of fish concentration 5 Conclusion Method for obtaining topographic maps of lakes, maps of fish concentration and a map of predator location based on the results of intelligent sonar data processing is presented. The presented algorithm is based on sonar images for the detection of clas- ses “fish”, “grass”, “school of fish”, “predator”, “bottom fish”. The algorithm in- cludes following steps: input frame separating into overlapping blocks, blocks- processing using CNN YOLO v2, and merging extracted bounding boxes around one object, fish concentration map building. To construct maps of the distribution of fea- tures along the lake, we propose a novel method for constructing the approximation of GPS-referenced CNN results based on the original implementation of fuzzy logic. Our method has an accuracy of 72.1% and has low percentage of false positive results in case of fish presence. To increase the accuracy, we need to significantly expand the dataset for CNN training. References 1. Balk H., Lindem T. Improved fish detection probability in data from split-beam sonar. Aquatic Living Resources. 13(5): 297–303(2000) doi: 10.1016/S0990-7440(00)01079-2 2. Helge B., Torfinn L.: Improved fish detection probability in data form split-beam sonar: https://slides.tips/improved-fish-detection-probability-in-data-form-split-beam-sonar.html 3. Kim J., Yu, SC.: Convolutional neural network-based real-time rov detection using forward-looking sonar image. Autonomous Underwater Vehicles (AUV), IEEE/OES. pp. 396–400. (2016) doi: 10.1109/AUV.2016.7778702 4. Krivoruchko, K.: Spatial Statistical Data Analysis for GIS Users. Redlands, Esri Press, (2011) 5. Demant, C., Garnica, C., Streicher-Abel, B. : Industrial Image Processing: Visual Quality Control in Manufacturing. Heidelberg, Springer (2013) 6. Shiping, Y., Zhican, B., Huafeng, C., Bohush, R. and Ablameyko, S.: An effective algo- rithm to detect both smoke and flame using color and wavelet analysis. Pattern Recogni- tion and Image Analysis. 27(1):131-138 (2017) doi: 10.1134/S1054661817010138 7. Krizhevsky, A., Sutskever, I. and Hinton, G. E.: ImageNet classification with deep convo- lutional neural networks. Proceedings of the 25th International Conference on Neural In- formation Processing Systems (NIPS'12), vol. 1, pp. 1097-1105 (2012) 8. Ren, Sh., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Object De- tection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence. 39(6): 1137 - 1149 (2017) doi: 10.1109/TPAMI.2016.2577031 9. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna,Z.: Rethinking the inception ar- chitecture for computer vision. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 27-30 June 2016, pp. 2818–2826 (2016) doi:10.1109/CVPR.2016.308 10. He, K., Zhang, X., Ren, Sh., Sun, J.: Deep Residual Learning for Image Recognition. Pro- ceedings of IEEE Conference on Computer Vision and Pattern Recognition, 27-30 June 2016, pp. 770–778 (2016) doi: 10.1109/CVPR.2016.90 11. Redmon, J., Divvala, S. K., Girshick, R. B., Farhadi, A.:You Only Look Once, Unified, Real-Time Object Detection. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 27-30 June 2016, pp. 779–788 (2016) doi: 10.1109/CVPR.2016.91 12. Redmon, J., Farhadi, A.:YOLO9000: Better, Faster, Stronger. Proceedings of IEEE Con- ference on Computer Vision and Pattern Recognition, 21-26 July 2017, pp. 6517–6525 (2017) doi:10.1109/CVPR.2017.690 13. Ioffe, S., Szegedy, Ch.: Batch normalization: accelerating deep network training by reduc- ing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning Microtome Publishing, 6 -11 July 2015., pp. 448–456 (2015) 14. Glukhov, D.:Dynamic expert system by fuzzy inference rules to automations an examina- tion of complex objects. Budownictwo i Inzynieria, Srodowiska, pp. 105–109 (1998) 15. Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos: https://github.com/Microsoft/VoTT