Utilization of Machine Learning in Recognition of Rocks and Mock-mines by Sonar Chirp Signals Yurii Kryvenchuk, Mykhailo Dmytryshyn .1 1. Introduction COLINS-2024: 8th International Conference on Computational Linguistics and Intelligent Systems, April 12–13, 2024, Lviv, Ukraine yurii.p.kryvenchuk@lpnu.ua (Y. Kryvenchuk); mikhailo2002dm@gmail.com (M. Dmytryshyn) 0000-0002-2504-5833 (Y. Kryvenchuk); 0009-0001-5627-732X (M. Dmytryshyn) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Related Works 3. Methods 3.1. Dataset The research utilized the "Connectionist Bench (Sonar, Mines vs. Rocks)" dataset from the UCI Machine Learning repository. The dataset is a csv file in which sonar patterns are stored. These patterns result from bouncing sonar signals off a metal cylinder and rocks, each explored across various angles and conditions. The sonar signals transmitted are frequency-modulated chirps, ascending in frequency, and were captured from diverse aspect angles—spanning 90 degrees for the cylinder and 180 degrees for the rock. Each pattern consists of 60 numerical values within the range of 0.0 to 1.0. These numbers denote the energy within specific frequency bands, integrated over defined time periods. Notably, the integration aperture for higher frequencies occurs later in time due to their transmission later in the chirp. The labels assigned to each record are "R" for rocks and "M" for mines (metal cylinders). While the labels exhibit an ascending order corresponding to the aspect angle, they do not directly encode the angle information. 3.2. Data processing and organization methods In the context of the work on "Utilization of Machine Learning in recognition of rocks and mock- mines by sonar chirp signals," Label Encoding is employed to convert the class labels (categories) into numerical values. For the task of recognizing rocks ("R") and mock-mines ("M") based on sonar chirp signals, the classes can be encoded into numerical values. For example, if there is a column with class labels like: ["R", "M", "R", "R", "M", "M", "R", "M", "R", "R"] Label Encoding can be used to transform these classes into numerical values, for instance: [1, 0, 1, 1, 0, 0, 1, 0, 1, 1] Here, "R" has been assigned the value 0, and "M" has been assigned the value 1. This conversion allows machine learning algorithms to work with the data, as many algorithms require numerical values for both input and output. Label Encoding can be performed using libraries like scikit-learn in Python, utilizing the LabelEncoder class. This encoding is particularly useful when dealing with categorical data in machine learning models. Ensemble methods, specifically AdaBoost-Samme, were employed to leverage the strengths of multiple weak learners. Decision trees, logistic regression, and random forests were individually used as base classifiers within the ensemble framework to assess the ir impact on classification accuracy. Various neural network architectures were explored. Techniques such as dropout and L2 regularization were applied to mitigate overfitting and enhance generalization performance. The performance of each model was assessed using accuracy as the result of cross validation score. 3.3. ML Methods In this study, a diverse set of machine learning algorithms has been employed to discern patterns and classify sonar signals. The algorithms chosen demonstrate versatility in handling the complexity of the data and offer a comprehensive exploration of the recognition task. The following algorithms have been applied: 3.4. Overfitting 4. Experiment 4.1. Dataset Preprocessing: 4.2. Evaluation 5. Results Table 1: Model accuracies Algorithm Accuracy AdaBoost-Samme - decision tree 71.12% AdaBoost-Samme - logistic regression 79.76% AdaBoost-Samme - random forest 87.50% Decision Tree 73.52% Decision Tree - min cost complexity pruning 71.21% Gaussian process - Laplace approximation 82.76% K-nearest neighbors vote 79.81% Logistic Regression 76.48% Logistic Regression - L1 77.93% Logistic Regression - L2 75.98% Logistic Regression - L1 and L2 77.43% Multi-layer Perceptron 80.31% Multi-layer Perceptron - L2 80.93% Neural Network 85.45% Neural Network - dropout 87.64% Neural Network - L2 86.61% Neural Network - dropout and L2 88.45% Random Forest 85.57% Figure 2: Dropout accuracies (according to weights – x axis) Figure 2: L2 Regularization accuracies (according to regularization factors – x axis) 6. Discussions 6.1. Effectiveness of methods 6.2. Comparison with Previous Research Conclusions systems. References [1] X. Lin, R, Dong, Z,Lv, Deep Learning-Based Classification of Raw Hydroacoustic Signal, Journal of Marine Science and Engineering 11 (2023). https://doi.org/10.3390/jmse11010003. [2] Jason Brownlee, Dropout Regularization in Deep Learning Models with Keras, 2022. URL: https://machinelearningmastery.com/dropout-regularization-deep-learning-models- keras/. [3] Y. Steiniger, D. Kraus, T. Meisen, Survey on deep learning based computer vision for sonar imagery, Engineering Applications of Artificial Intelligence 114 (2022). https://doi.org/10.1016/j.engappai.2022.105157 [4] D. Karimanzira, H. Renkewitz, D. Shea, Object Detection in Sonar Images, Electronics 9 (2020). https://doi.org/10.3390/electronics9071180 [5] J. Fernandes, N. Junior, Deep Learning Models for Passive Sonar Signal Classification of Military Data, Remote Sensing 14, (2022). https://doi.org/10.3390/rs14112648 [6] OpenGenus, Advantages and Disadvantages of Logistic Regression, 2024. URL: https://iq.opengenus.org/advantages-and-disadvantages-of-logistic-regression/. [7] IBM, What is a Decision Tree?, 2024. URL: https://www.ibm.com/topics/decision- trees#:~:text=A%20decision%20tree%20is%20a,internal%20nodes%20and%20leaf%20 nodes. [8] A. Nagpal, L1 and L2 Regularization Methods, 2017. URL: https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c [9] Y. Tian, Y. Zhang, A comprehensive survey on regularization strategies in machine learning, Information Fusion, (2021). https://doi.org/10.1016/j.inffus.2021.11.005 [10] J. Shubham, An Overview of Regularization Techniques in Deep Learning, 2023. URL: https://www.analyticsvidhya.com/blog/tag/regularization-in-deep-learning/. [11] L. Ruhela, Droput Rtgularization, 2023. URL: https://ruhelalakshya.medium.com/dropout - regularization-b27885b4c55b. [12] N. Srivastava, G. Hinton, (Eds.), Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15, (2014): 1929-1958. [13] G. Geeks, Logistic Regression in Machine Learning, 2023. URL: https://www.geeksforgeeks.org/understanding-logistic-regression/. [14] W. Gong, J. Tian, J, Liu, Underwater Object Classification Method Based on Depthwise Separable Convolution Feature Fusion in Sonar Image, Applied Sciences 12 (2022). https://doi.org/10.3390/app12073268. [15] Harsh Yadav, Dropout in Neural Networks, 2022. URL: https://towardsdatascience.com/dropout-in-neural-networks-47a162d621d9 [16] IBM, What is overfitting?, 2024. URL: https://www.ibm.com/topics/overfitting.