Neural Networks to Recognize Ships on Satellite Images Svitlana Popereshnyak1, Anastasiya Vecherkovskaya2, and Liubov Ivanova1 1 National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute,” 37, Prospect Beresteiskyi, Kyiv, 03056, Ukraine 2 Taras Shevchenko National University of Kyiv, 24, Bohdana Gavrylyshyn str., Kyiv, 02000, Ukraine Abstract In the course of the work, various digital image processing algorithms were analyzed in detail, and special attention was paid to their use in solving the actual problem of recognizing ships on satellite images. Significant results were achieved in this area through the development of software that allows for solving high-precision recognition tasks based on a specially designed and trained convolutional network. The software development stage also included experimental studies and performance evaluation of the developed solution, which allows us to objectively determine the efficiency and potential capabilities in the context of a particular task. The implementation of the results obtained can help improve the quality and speed of ship recognition on satellite images, which is of great importance in various fields, including maritime and environmental monitoring. The general methodology and developed algorithms can also be applied in other areas of image processing and computer vision. As a result of our research, we are confident in the effectiveness and prospects of using the obtained developments to solve specific problems of object recognition on large volumes of satellite images. Keywords 1 Neural networks, machine learning, software. 1. Introduction training and testing, datasets were generated in the form of two-dimensional images with Every year, a huge amount of parallel work and three color channels. data that needs to be processed in a certain Ship images have been studied by scientists way appears in various fields. These tasks are from all over the world, in particular, [3] of the same type, repetitive, or require presents a sequence of image processing constant human concentration to control or algorithms suitable for detecting and search for an object. Various software products classifying ships from nadir panchromatic and machine learning algorithms are used to electro-optical imagery. In [4], an algorithm simplify people’s work and reduce time. was developed to classify ships according to Today, machine learning algorithms are in size using image processing. The image of the active use ship was captured by a stationary camera. Many scientists have studied the use of Classification and segmentation of ships by neural networks to solve image recognition analyzing satellite images will help in problems. The paper [1] reviews the main searching for objects without human methods for solving computer vision problems intervention, because there are many seas and of classification, segmentation, and image oceans, and people will not need to look processing implemented in CV systems. through every square kilometer to find it. This In [2], the convolutional properties of an will help to find objects faster and reduce the autoencoding neural network for object cost of human labor. This will help to control detection in an image are considered. For the delivery time of certain ships carrying CPITS-2024: Cybersecurity Providing in Information and Telecommunication Systems, February 28, 2024, Kyiv, Ukraine EMAIL: spopereshnyak@gmail.com (S. Popereshnyak); vecherkovskaia90@gmail.com (A. Vecherkovskaya); liubov.ivanova555@gmail.com (L. Ivanova) ORCID: 0000-0002-0531-9809 (S. Popereshnyak); 0000-0003-2054-2715 (A. Vecherkovskaya); 0000-0002-9082-928X (L. Ivanova) ©️ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 440 cargo. Or to control water borders so that an allows us to keep important features and unwanted object does not cross certain discard unimportant ones. boundaries. These methods will also help to Usually, one convolution is used more than control and locate enemy ships [5–7]. once. To make it easier to understand the The work aims to study algorithms and architecture of a neural network, we use the develop software for searching for ships in a word neural layer, which is a certain stage of a satellite image of the sea area [8, 9]. neural network. Convolutions are filters that slide over the 2. General Overview of Algorithms input image and perform multiplication and summation operations to create a feature map and their Comparison (Fig. 2). Convolutional neural networks can be used to classify and segment digital images, which are specially designed to handle large amounts of data such as images. Convolutional neural networks use convolution to detect local features in images and pooling to reduce the dimensionality of the image. They can be successfully used to classify objects in an image Figure 2: Using filters and segment the image into separate clusters. Fig. 2 on the left shows how convolutions work Convolutional neural networks are a type of with an image. The size of the photo is neural network commonly used for image and 32×32×3, where 32×32 is the size of the pixels video processing. These networks are used to in the horizontal and vertical directions, and 3 automatically detect image features and is the number of filters. Filters are colors, for characteristics such as borders, shapes, and example, red, green, and blue. textures. The letter 𝑤 denotes the convolution itself The main difference between convolutional and its 5×5×3 size. The size of the convolution neural networks and fully connected networks is specified by the programmer. is that convolutional networks use On the right side, this is a view of the feature convolutions to process input images instead map to get: of treating each pixel of an image as a separate (1) input (Fig. 1). 𝑤𝑇𝑥 + 𝑏 , where 𝑤 𝑇 is the transposed convolution, 𝑥 is the part of the image that is highlighted by the convolution range, and 𝑏—is the value that the artificial model is looking for. The main advantages of convolutional networks are that they can automatically detect and utilize local features in an image, which reduces the number of parameters that need to be trained and provides faster and Figure 1: Reducing the size of a digital image more efficient performance. One of the using convolution advantages of convolutional neural networks is that they can effectively recognize local In Fig. 1, the bottom square is the input photo. features in images, such as corners, edges, and And the top one is the new look of the photo textures, reducing the number of parameters after going through the convolution (kernel). and computational complexity compared to And the lines connected between the squares fully connected neural networks. In addition, are the convolution that transforms the convolutional neural networks can objects. In other words, with the help of automatically learn useful features, reducing convolution, we reduce the dimensionality of a the need to manually select features to use. digital image for a particular purpose, which Some of the disadvantages of convolutional networks include high computational 441 complexity and the ability to overlearn training data, Increased difficult to set up, resilience to change, requires a lot of data. Also, convolutional networks can have a and less dependence data. complex architecture, which can make them on model difficult to understand and develop. The architecture. Convolutional They are effective Requires a large disadvantages of convolutional neural neural networks because they were amount of data, networks are the requirement for a large created for image requires amount of data for training, as well as the classification and computing segmentation. Multi- resources, difficulty of understanding and interpreting level learning, with interpretation, them. In addition, convolutional neural the right architecture, data dependence is very accurate. networks can tend to overlearn, especially when there is not enough data to train. Compared to fully connected networks, After researching the types of algorithms for convolutional networks are usually better for classification and segmentation, since neural image processing because of their specialized networks can be used as a designer, we chose filters that can detect different types of the convolutional neural network algorithm as features. This reduces the number of the basis for segmenting the image into 2 parameters that need to be trained and classes - ship and non-ship. To do this, we need improves training efficiency. to create an auto-encoder that will reduce the You can also use autoencoders to segment size of the input image and then increase it, images and obtain embeddings for image leaving only ships in the image. It performs the classification. In addition, contrastive learning task of this work best, there is also a large can be used to train embeddings that can be number of digital images, and a GPU used for image classification. accelerator will be used to train the algorithm, All of these methods can be successfully which will speed up the cloud solutions used for image classification and learning process many times over. segmentation, depending on the specifics of the task and the availability of data. 3. Input Data Analysis Below is a comparative table of different algorithms (Table 1): The main type of input data is a digital photo Table 1 taken from a satellite, they are in jpg format. Advantages and disadvantages of different There is also another type of data—masks, and neural networks coordinates of ships on the photos, if they are Algorithm Advantages Disadvantages there, then their format is CSV. The first type of Random forest Fast learner. No need It will not be able data, photographs, looks like this (Fig. 3): for retraining. to segment the Interpretability. image. It can only be categorized. Boosting Works well with big Overtraining and data. Quite accurate high computing in classification. power load are possible. Fully connected With the right design, Suitability for neural networks it offers high retraining. precision and the Resource ability to parallelize requirements. machining. When segmenting, you need to create a large number of layers—there will be a problem Figure 3: View of input digital images with gradient attenuation. These photos show that the size of ships and Autoencoders It also works well Resource and embedders with noise and image requirements their number vary. There are also variants of generation and data quality photos where there are no ships, but there are capabilities, reducing are dependent, on data size. the need for data other objects, such as islands, piers, or a part of pre-processing. the land. And the photos show that the color of Contrastive No need for data High resource the water is different from each other. learning labels, efficient use of requirements, 442 All photos are 768 by 768, but the program also them, so different methods need to be used. In has a case that converts another size to this this process, we first took a smaller number of one. The data is displayed as masks (Fig. 4). photos with no ships by about 2 times and generated a large number of artificial images on those digital images that have ships. Figure 5: Distribution of images with and without ships The number of digital images with ships is 42556. The number of digital images without ships is 149999. The number of ships in a digital image is also important (Fig. 6). Since the model has to understand what data it is working with, it will adapt to the data for training. Fig. 6 shows that most digital photos have one ship each and the distribution is similar to a logarithmic distribution. Since the program will generate a large number of artificial images, this distribution is not a problem. Figure 4: The appearance of masks for digital images The total number of objects to be studied, is 192555 photos, but the program artificially creates additional images from the initial set, for example, it takes one image and rotates it vertically or horizontally, and this adds more objects, but it may be that the model will overlearn on this set, so you need to Figure 6: Distribution of the number of ships immediately select the number of photos that in the images the model has never seen. Therefore, in To understand how to use masks, you can look addition to 192555 objects, we took 15600 at Fig. 7: more to test the model. When training a model, the distribution of objects is very important. There are cases when there is an imbalance of classes, as in our case (Fig. 5). This is critical because it is easier for the model not to find ships than to find 443 Figure 7: Applying masks to an image In this case, the yellow translucent color was chosen to show the location of the ships—their Figure 8: Distribution of the number of ships coordinates. The segmentation task in the in the test sample program is to search for ships in a digital image Data processing in machine learning is an and select them—to find the coordinates, i.e. important process, as proper processing can masks. boost the model’s results very highly, and If we look at Fig. 8, we can see that the incorrect processing can greatly reduce the distribution is exponential in the test set of results and quality of the model. However, a photos, and most of the photos have one ship. common practice for working with images is to However, the rest of the number gradually and artificially create additional images based on smoothly decreases, which allows us to the input ones. correctly estimate the model on the test set. Below is a description that creates the AirbusDataset class, which allows you to use various methods to generate images: Figure 9: Code listing The input to the class object is 3 parameters: 4. Description of the Metrics Used 1. In_df is an array with a link to photos, and masks. The choice of metric is also important, as the 2. Transform is an array of functions or metric is involved in model training. There are objects of the processing class, they are always two important problems in such tasks. given below. It is necessary to choose either the model 3. Mode has 2 modes ‘train’ and will use the most accurate classification, but ‘validation’, when you select one of them, this may lead to the fact that the model will it will generate new photos to the sometimes classify islands, waves, or other database with a specific sample. objects as ships. Or choose the other option As an output, the class will generate new when the cost of error is critical and the model objects to the database—new photos. will repeatedly not classify different objects, 444 but this will result in very small ships (boats) that combines their advantages. It is used to not being found or not all points being classified minimize errors when training a semantic in the same ship group. segmentation model. For this work, we chose the second type, because this program should help users, so we 5. Software and Neural Network don’t want to distract them with unnecessary noise. Therefore, the BCEJaccardWithLogitsLoss Architecture metric was chosen. The BCEJaccardWithLogitsLoss metric is a Creating the architecture is an important step combination of two metrics: Binary Cross- because the number of layers is very Entropy (BCE) and Jaccard coefficient, which important. If there are a lot of layers, the model are used to evaluate the quality of binary will take a long time to learn and overlearn on semantic segmentation. This metric is a good test data, and if there are few layers, the model fit because we need to segment whether a pixel may not learn and perform poorly on test data. is a ship or not. The architecture consists of 2 main parts for BCE measures the correspondence between auto-encoding. The first is an encoder, i.e., the predicted and true pixel values of an image. It reducing the dimensionality of the object, for does this by calculating the cross-entropy which the down_block class was created. The between the predicted and true pixel other part is the inverse of the previous one. distributions. The higher the BCE, the less That is, you need to expand a small object to accurate the prediction is. The general algorithm the size of the input image—this is the decoder for calculating the BCE metric: process, which allows you to create a new 1. Select the initial vector ор 𝑣 (0) ; take 𝑡 = 1. object—a matrix of 0 and 1. Where 1 is the 2. Next, you need to generate a random location of the ship on the map, and 0 is a segmented non-ship in the digital image. sample, 𝑋1 … , … , 𝑋𝑁 з 𝑓(𝑋; 𝑣 (𝑡−1) ). To create a neural network, you need to 3. Solve for 𝑣 (𝑡) , where 𝑁 create a connection and combine the previous 𝑣 𝑡 = 𝑎𝑟𝑔𝑚𝑎𝑥 1 ∑ 𝐻(𝑋𝑖) 𝑓(𝑋𝑖 ; 𝑢) ) 𝑙𝑜𝑔 𝑓(𝑋𝑖 ; 𝑣) (2) blocks correctly. The NN_Ship_Detection class is 𝑁 𝑓(𝑋𝑖 ; 𝑣 (𝑡−1)) 𝑖=1 responsible for this. When convergence is achieved, stop, You can also see its basic appearance in otherwise, t is increased by 2, and proceed to Fig. 10. step 2. The Jaccard coefficient measures the similarity between the predicted and true values. This is done by calculating the overlap area between the sets of pixels corresponding to the predicted and true regions in the image. The higher the Jaccard coefficient, the more accurate the prediction is. The Jaccard coefficient measures the similarity between sets and is defined as the measure of the common part divided by the Figure 10: General view of the architecture measure of the union of the sets: The architecture consists of 14 different 𝐽(𝐴, 𝐵) = |𝐴 ∩ 𝐵| = |𝐴 ∩ 𝐵| blocks. 7 is down_block, 6 is up_block, and the |𝐴 ∪ 𝐵| |𝐴| + |𝐵| − |𝐴 ∩ 𝐵| (3) last one converts the object into a matrix with 0 ≤ 𝐽(𝐴, 𝐵) ≤ 1 the size of the input image. There is also an where А і 𝐵 is a predicted value and a real activation function between each block - ReLu. value. The blocks work as follows: When 𝐴 and 𝐵 are both empty, 1. down_block1 is the image is added and then𝐽(𝐴, 𝐵) = 0. the block increases the number of The Jaccard coefficient is also used to find filters from 3 to 16. similar texts in a large corpus of documents. 2. down_block2 increases the number of BCEJaccardWithLogitsLoss combines BCE and filters from 16 to 32. Jaccard coefficients into a single loss function 445 3. down_block3 increases the number of test set is used to finally evaluate the model filters from 32 to 64. performance. 4. down_block4 increases the number of After training the model, testing on a deferred filters from 64 to 128. dataset is performed on a test set that consists of 5. down_block5 increases the number of data that the model did not see during training. filters from 128 to 256. The purpose of testing is to evaluate the model’s 6. down_block6 increases the number of performance on new data that the model has not filters from 256 to 512. seen before. The results of the testing can be used 7. down_block7 increases the number of to make decisions about using the model in real- filters from 512 to 1024. world applications, such as production tasks or 8. Then normalize again. research. 9. up_block1 reduces the number of When testing on a deferred dataset, it is filters from 1024 to 512. important to keep in mind that the test results 10. up_block2 reduces the number of may be dependent on the composition of the test filters from 512 to 256. sample. If the test sample does not represent 11. up_block3 reduces the number of diverse data, you may experience a carryover filters from 256 to 128. problem where the model performs well on a 50- 12. up_block4 reduces the number of test sample but performs poorly on real data. To filters from 128 to 64. avoid this problem, you should use a test sample 13. up_block5 reduces the number of that represents as much diversity as possible. filters from 64 to 32. After evaluating the results on the deferred 14. up_block6 reduces the number of set, you can determine how well the model can filters from 32 to 3. generalize its knowledge to new data. If the 15. last_conv2 mixes the number of filters results on the deferred dataset are poor and the from 3 to 1, where the result is a matrix results on the training dataset are very good, with zeros and ones. then this may indicate that the model is So the main job of this neural network is to overtrained on the training dataset. reduce the size of the image so that only the It is also important to keep in mind that the information about the location of the ships resulting metrics on the deferred dataset may be remains, and the rest of the information is slightly worse than on the training dataset, as the discarded. Thus, as a result, we get a new model has not had the opportunity to learn from object—a matrix with zeros and ones. this data and they are assigned the role of evaluating the overall performance of the model. 6. General Discussion and However, if the difference between the results on the training and deferred datasets is significant, Evaluation of the Results it may be a sign of the model’s poor ability to generalize its knowledge to new data. Since this system is based on the creation of a Testing on a deferred dataset is an important neural network, the main part of testing will be step in the process of developing neural the algorithm itself. Testing neural networks is networks, as it allows you to check the overall an important part of their development and use. performance of the model on new, previously It is the process of evaluating the quality of a unseen data. training model on independent test data to Evaluation is an important step to check confirm that it works properly. whether the model is working well and whether Backpropagation testing is the process of there are moments when the model is testing the performance of a machine learning overlearning or underlearning. It is important to model using a dataset that the model has not look at each iteration and choose the best one. seen before. Using the testing method described above, we To perform testing on a deferred dataset, you checked the quality of the image model first need to divide the total dataset into training, classification. Fig. 11 shows the training process. validation, and test samples. The training set is used to train the model, the validation set is used to adjust hyperparameters and evaluate the model performance on unknown data, and the 446 7. Conclusions This paper studies modern methods of working with digital images. This gives an understanding of working with artificial intelligence, namely neural networks. What types of them are there, what they are used for, their advantages and disadvantages for Figure 11: Metric results for each iteration of maximum quality of digital image the model during training segmentation? The architecture of the neural network was The data selected for validation shows a good created. For its construction, convolutional result. However, the graph shows that there is a neural networks in the autoencoder system, a slight overfitting. Overfitting is an unpleasant method for encoding and decoding were used. moment when the model memorizes the data it The main function of this network is to obtain has been trained on well but performs poorly on a digital image and convert it to a matrix form, new data. But as you can see on the graph, the where the elements have values of 0 and 1, overfitting is not critical and the model copes where 0 is not a ship and 1 is a ship. We also well with the result. analyzed the architecture components: To better understand the model’s estimates, advantages and disadvantages, metrics, we need to look at the table (Table 2): activation functions, and the data the model Table 2 receives at the input and output. Using this Model results and comparison with analogs architecture, we created software for Title 51 674 1274 1974 2600/last_res recognizing ships in digital images. MyModel 0.922 0.102 0.039 0.029 0.015 We went through all the main stages of AlexNet - - - - 0.004 U-Net - - - - 0.009 working with the model. The first step was to search for data, which allowed us to choose the Table 2 compares 3 different models, where right methods for working with images. MyModel is the model created in the course of The next step was to analyze the data to see this work. And the other two are state-of-the- the features of the dataset. It was found that art models that are publicly available and have there was an imbalance of classes in the set, been trained on different data. They are also where there were many more photos without trained on 52 digital images of ships from the ships than photos with ships. Then it was satellite, so it is possible to compare these decided to artificially create data—digital models with each other. images that had at least one ship were flipped The table shows that the model does not horizontally and vertically, which made it segment the image well at a small number of possible to increase the number of digital iterations. At iteration 51, the model is wrong images with a ship by 3 times. in about 92% of cases. But with many Then the data was processed, and an iterations, the result improves. additional number of images with ships were It is also possible to train the model for a created. This set can also be used in other tasks. much higher number of iterations, but it can The next step was to create a neural cause the problem of gradient decay—this is network architecture using convolutional when the model in the layer works with very methods and encoding and decoding— small values, and it takes a lot of time and autoencoding. For this purpose, we used the power to calculate them, and calculating small Python programming language and the gradients will almost imperceptibly improve PyTorch library. 54 This library is designed to the results. work with neural networks and makes it The model performed best at iteration possible to use GPU technology. number 2600. The model is wrong in almost The last stage in the program’s 2% of cases, which is a good result and this development was model training and quality result will be the final one. assessment. The model was trained for 2600 447 iterations. The trained model can be used in J. Theor. Appl. Inf. Technol. 100(24) other tasks and programs. (2022) 7426–7437. To measure the quality, we chose the metric [7] V. Sokolov, P. Skladannyi, A. Platonenko, for segmentation—BCEJaccardWithLogitsLoss. Video Channel Suppression Method of To qualitatively measure the model, we chose the Unmanned Aerial Vehicles, in: IEEE 41st method of splitting the data into training, test, International Conference on Electronics and validation data. In the final testing of the test and Nanotechnology (2022) 473–477. data, it was found that the trained model was doi: 10.1109/ELNANO54667.2022. wrong in 1.5% of cases from the correct value, 9927105. which is a good result. [8] K. Eldhuset, An Automatic Ship and Ship A possible improvement of the architecture Wake Detection System for Spaceborne is to turn the task into a multi-segmentation SAR Images in Coastal Regions, IEEE one, where the model can show not only the Transactions on Geoscience and Remote location of the ship but also its type, for Sensing 34(4) (1996) 1010–1019. doi: example, cargo, tourist, military, etc. 10.1109/36.508418. [9] A. Vecherkovska, S. Popereshniak, References Review of Machine Learning Algorithms and Their Application for Forecasting Cryptocurrency Purchase Prices, [1] O. Zinchenko, O. Zvenigorodsky T. Kisil, Bulletin of Kherson National Technical Convolutional Neural Networks for University 4(87) (2023) 223–229. doi: Solving Computer Vision Problems, 10.35546/kntu2078-4481.2023.4.26. Telecommunication and Information Technologies 2(75) (2022) 4–12. doi: 10.31673/2412-4338.2022.020411 [2] L. Yasenko, Y. Klyatchenko, Convo- lutional Neural Network Properties Based on an Autoencoder, Inf. Technol. Comput. Eng. 52(3) (2021) 77–85. doi: 10.31649/1999-9941-2021-52-3-77-85. [3] H. Buck, et al., Ship Detection and Classification from Overhead Imagery, Proc. SPIE 6696, Applications of Digital Image Processing XXX, 66961C (2007). doi: 10.1117/12.754019. [4] G. Santhalia, S. Singh, S. Singh, Safer Navigation of Ships by Image Processing & Neural Network, Second Asia International Conference on Modelling & Simulation (AMS) (2008) 660–665. doi: 10.1109/AMS.2008.48. [5] B. Bebeshko, et al., Application of Game Theory, Fuzzy Logic and Neural Networks for Assessing Risks and Forecasting Rates of Digital Currency, J. Theor. Appl. Inf. Technol. 100(24) (2022) 7390–7404. [6] K. Khorolska, et al., Application of a Convolutional Neural Network with a Module of Elementary Graphic Primitive Classifiers in the Problems of Recognition of Drawing Documentation and Transformation of 2D to 3D Models, 448