<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Intelligent Hybrid Mobile Robotic Landmine Detection System⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victor Sineglazov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kyrylo Lesohorskyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University of Ukraine “Ihor Sikorsky Kyiv Polytechnic Institute</institution>
          ,”
          <addr-line>Ave. Beresteysky, 37, Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National University of Ukraine «Kyiv Aviation Institute»</institution>
          ,
          <addr-line>ave. Lubomir Husar, 1, 03058, Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>904</volume>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper considers a hybrid mobile robotic system for the problem of landmine detection. The system consists of two intelligent robotic agents. The first agent is a highly mobile aerial-based detector with a hyperspectral / multispectral camera and is designed to identify all areas of landmine installation. The second intelligent agent is a ground-based robot with an infrared camera, which has a lower sensitivity threshold and is used to further validate the mines identified by the first intelligent agent. Semisupervised learning with spectral-spatial consistency is used to train CNN-based feature extraction and classification pipeline. The proposed technique allows to boost labeled sample size by ~3 times and achieves high recall (1.0) and moderate precision (0.694). The verification step improves precision to 0.93, reducing the number of false positives.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Landmine detection</kwd>
        <kwd>hybrid robotic system</kwd>
        <kwd>semi-supervised learning</kwd>
        <kwd>hyperspectral imagery 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In many countries, mines pose a serious threat to life and cause economic problems. Mines are
dangerous because their location is unknown and they are often difficult to detect. The
development of new demining technologies is difficult due to the wide variety of terrains and
environmental conditions in which mines are laid, as well as the wide variety of mines. Currently,
detecting and clearing mines requires special knowledge and special equipment.</p>
      <p>The conduct of active hostilities on the territory of our state led to the appearance of significant
territories contaminated with explosive objects, which poses a real danger to people's lives and
health, prevents them from ensuring their livelihoods and restoring economic activity, has a
negative impact on ecosystems, etc. Today, Ukraine is the most polluted place in the world.
According to estimates, about a third of its territory - an area the size of Florida - contains WB.</p>
      <p>
        Prompt, accurate and automated detection and determination of explosive objects is becoming
an extremely urgent task today. At the same time, the methods and conditions of planting
explosive objects, in particular their special concealment, scattering as a way of setting up and
scattering remnants as a result of explosions, as well as "noisiness" of the battlefield territories and
significant limitations (primarily of a technical nature) of detection means, significantly complicate
the task of detecting and identifying explosive objects. by conventional computer vision methods.
This necessitates the development of new methods for determining such objects, based on hybrid
approaches. The process of detecting mines is complex, dangerous and expensive. The main
problem is not to remove mines, but to accurately determine the location of the mine. This entire
operation is still performed manually, which leads to the fact that the pace of demining is very
unsatisfactory. Since 20 times more mines are installed for each remote mine [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Therefore, a more appropriate approach based on the latest technologies is critical, minimizing
risk while increasing the speed of demining and the accuracy of the procedure. Therefore, the
development of hybrid robotic systems that include unmanned air and ground vehicles and can
carry sensors with minimal interaction with human operators is of great importance. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>This is achieved by dividing the overall system into two subsystems: sensor technologies and a
robotic device.</p>
      <p>
        There are many problems in mine detection. The first is that changes in weather factors have
led to the disappearance of underground mine spaces. The second is the ability to stop the effects of
mines without seeing underground. The third is that detection requires not only the presence of a
mine, but also the robot must mark the location of the mine with an accuracy of 5 cm.
Work is needed to merge mine detection technologies to improve their performance, as each
approach produces good results in limited conditions. Due to the above limitations, a multi-sensor
system based on signal fusion and algorithms should be developed. Rather than focusing on
individual technologies operating in isolation, mine detection research and development should
emphasize design from first principles and subsequent development of an integrated multi-sensor
system that overcomes the limitations of any single sensor technology. Combining different types
of sensors will certainly achieve better detection results [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Robot-based landmine removal</title>
      <p>The ability to detect mines in a surface minefield using autonomous robots is becoming
increasingly popular as it reduces the danger and cost of manual detection [19]. The robots search
for mines with such low pressure that mine explosions do not go off. To effectively cover all mined
areas, robots must adapt to accelerated reconnaissance to improve efficiency, especially if there is
any surveillance team. The use of robots to detect landmines provides an ideal sensor for robots
due to its low cost, wide availability, high data content and speed of information. Adequate
clearance rates can only be achieved using new technologies such as improved sensors, efficient
manipulators and mobile robots [16]. Estimating the position of buried landmines using data from
landmine detection sensors is important in selective work. One of the problems with current mine
detection robots is that they are quite large in structure and very expensive. Although several
inexpensive mine detection robots have also been developed, most of them use simple algorithms
so that they can only work in simple, unobstructed environments [11]. The development of
lightweight, low-cost, semi-autonomous robots operating in conjunction with a monitoring station
(personal mine scouts) is a well-studied approach [17]. The robots search for mines at such low
pressure that mine explosions do not go off. Multi-robot systems for area reduction form the next
step in the search for landmines. Some research has been carried out on a multi-agent based
architecture responsible for coordinating advanced stochastic terrain analysis. It includes the
reactive obstacle avoidance technique and the development of mission control software to plan,
configure and control operations. The system uses walking, wheeled and aerial robots. Finally, this
study describes a payload sensor system using Fourier analysis as a mechanism for efficient mine
detection.</p>
      <p>Three main facts made it possible to gradually increase the efficiency of surveys using UAVs
[13,14]:

</p>
      <p>The UAV industry (eg DJI) has created very advanced systems that enable
computeraided planning and automated aerial detection missions with multiple sensors.</p>
      <sec id="sec-2-1">
        <title>The sensor industry has provided powerful devices to match UAVs.</title>
        <p>The software industry has provided tools to process the records collected by UAVs,
producing results of the highest quality.</p>
        <p>
          The first UAV for humanitarian demining appeared in the EU ARC project. The use of UAVs
with visible color sensors for humanitarian mine clearance has increased over the past 5–10 years.
[
          <xref ref-type="bibr" rid="ref4 ref5 ref9">4,5,9–12</xref>
          ].
        </p>
        <p>In the context of a high level of false alarms in an intelligent hybrid mobile robotic mine
detection system, it is proposed to use two information-connected robots.</p>
        <p>The idea of obstacle avoidance for an autonomous mine detection robot is based on the use of
continuous mapping and localization techniques (simultaneous localization and mapping – SLAM).
This allows the robot's position to be localized while the map is updated using Gmapping [17].
The purpose of this is to adjust the parameters so that the robot can navigate the new map and
reach the desired location, avoiding potential obstacles along the way. The path planner gives the
robot a path, which is a new path that allows the robot to avoid obstacles along the way. The local
planner sends commands that allow the robot to follow its path. This is done by estimating the
robot's position using data obtained from a laser scanner [18].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Landmine detection technologies</title>
      <p>Today, the main means of technical inspection in mine countermeasures include:
1. Metal detectors that work on the basis of the principle of electromagnetic induction.
2. Geo-radar (Ground Penetrating Radar, GPR), which use high-frequency electromagnetic
waves to scan subsurface layers of soil, concrete, or other materials.</p>
      <p>3. Explosive vapor detectors ("electronic noses") that detect molecules or microparticles of
explosive substances in the air.</p>
      <p>4. Mine detectors based on acoustic, ultrasonic or seismic methods.</p>
      <p>5. Non-linear mine detectors (Non-Linear Junction Detector, NLJD) are devices for detecting
non-linear electronic components that are commonly used in various electronic devices.</p>
      <p>5. X-ray systems that are not used for wide areas and have restrictions on the location of the
object in the viewing area.</p>
      <p>6. Thermal imagers, which have less effect in the case of equalized temperature (for example, at
night or during long-term cover of explosives).</p>
      <p>However, traditional methods used in mine countermeasures have significant limitations, which
depend not only on the physical principles of action and the technical and technological levels of
development of these methods, but above all on the specific tasks and conditions of this activity.</p>
      <p>Innovative methods of detecting explosive objects include, first of all, the methods of analyzing
infrared and hyperspectral images (Hyperspectral Imaging, HSI). Despite the maturity of sensors,
signal processing algorithms remain underdeveloped and not related to physical phenomena.
Thermal signatures are currently not sufficiently studied, and there is no comprehensive prognostic
model" [20]</p>
      <p>Practically all studies of methods of detecting mines and explosive objects mentioned here, both
traditional and innovative, agreed with the thesis that for their effective determination, it is
necessary to apply the combination and fusion of data of different types.</p>
      <p>Research suggests using both different types of sensors and different methods of data fusion.
The hardware combination is based on the fact that multisensor systems combine technologies
with various sources of false alarms, so they can significantly reduce the frequency of false alarms.
The most powerful methods of processing infrared and hyperspectral images are methods of
artificial intelligence.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Objects and features</title>
      <p>The measured data in hyperspectral images can be visualized as a data cube. A
hyperspectral cube is a three-dimensional array of data that represents a set of spectral images of
an object captured in various narrow spectral bands in the electromagnetic spectrum). The
hyperspectral cube consists of two-dimensional spatial information (x, y), which reflects the
location of pixels in the image, and the third dimension - the spectral axis (λ), which corresponds to
the intensity of light or signal at each wavelength. Each slice of the data cube contains an image of
the scene at a specific wavelength. Each pixel is associated with a vector of spectral responses,
otherwise known as a spectral signature.The signal intensity (which is recorded for each pixel and
each spectral channel) is the main parameter that reflects the amount of electromagnetic radiation
recorded by the sensor. The hyperspectral cube is thus defined by spatial and spectral structures.
Spatial structure is characterized by:</p>
      <p>a) resolution (spatial resolution), which determines the detailing of spatial information, i.e. high
spatial resolution makes it possible to recognize small objects in the image;</p>
      <p>b) size in space: dimensionality along the axes x and y corresponds to the number of pixels that
cover the area of the object or scene.</p>
      <p>The spectral structure of the hyperspectral angle is determined by:
a) the number of spectral channels (bands), i.e. the number of narrow spectral bands in which
the signal is recorded - usually from several tens to hundreds of channels;</p>
      <p>b) spectral resolution, which determines the width of each spectral channel (for example 1–10
nm); higher spectral resolution makes it possible to better distinguish materials;
c) wavelengths that cover a certain range of the electromagnetic spectrum.</p>
      <p>The conditionality of the data, in particular, a large number of spectral anals, determines the
high computational complexity of the hyperspectral cube data processing algorithms. An
important characteristic of the hyperspectral cube is the signal-to-noise ratio (SNR), which
determines the quality of spectral information: a high SNR value provides a more accurate
determination of material characteristics. Some hyperspectral cubes include a time component,
which adds another measurement axis. This is used to monitor dynamic processes (such as changes
in vegetation) or highly dynamic processes (targeting). The data of hyperspectral cubes are highly
informative, because they contain detailed information about the physical, chemical, biological, etc.
properties of objects. Another property of hyperspectral cube data is the spectral correlation of
channels, which should be taken into account during intelligent image processing. Signal intensity
can be expressed in relative units (reflectance, emissivity, etc.) or calibrated physical units.
Calibrated units take into account the physical properties of the scene, the sensor, and the
environment in which the measurement is made. They make it possible to obtain accurate data
about the physical parameters of the signal, for example, energy, flow or radiation intensity.
Calibration involves taking into account the spectral sensitivity of the sensor, atmospheric
conditions (scattering, absorption), scene geometry (angle of light incidence/reflection), power of
the light source. It should be taken into account that calibration requires additional time,
equipment and resources.</p>
      <p>As of today, the main features that are promising for use in the remote detection of mines and
explosive devices using the methods of intelligent processing of hyperspectral images can be
considered:
1) actual mines and explosive objects as physical objects;</p>
      <sec id="sec-4-1">
        <title>2) spectral signatures of explosives and associated substances;</title>
      </sec>
      <sec id="sec-4-2">
        <title>3) disturbance of land cover and vegetation;</title>
      </sec>
      <sec id="sec-4-3">
        <title>4) marking of dangerous objects and zones (in conventional cases);</title>
        <p>5) unmasking signs on the terrain (in case of concealment).</p>
        <p>The main task of using hyperspectral analysis methods to solve the tasks of demining territories
and disarming explosive devices can be the detection and identification of substances used in
explosive devices. It is the spectral signatures (signatures) of the specified substances that will
determine the limits of the wavelength ranges of electromagnetic radiation, which should be
investigated using hyperspectral analysis, and therefore, the technical requirements for equipment
and, ultimately, the methods of intelligent analysis of hyperspectral images. However, methods of
detection and identification of substances in demining tasks have certain limitations.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Problem Statement</title>
      <p>In this work, the task of detecting mines will be considered as a segmentation task - pixels that
correspond to mines and other objects of interest must be selected on the target images. More
formally, an image I represented as a tensor H × W × C, where H is the height of the matrix, W is
the matrix width, C is the number of channels. Each element of the matrix x represents an image
pixel and contains a single rational value x ∈ R, which corresponds to the "brightness" of the
surface in this pixel. The work solves the task of constructing a segmentation function that
transforms the input image I to the segmentation map I' of size H × W where each element
y∈C , C ={∅ ,1 ,2 ,3 , ... , c } , defined as an empty space or an explosive ordnance.
When evaluating the quality of an intelligent system, it is important to use realistic standards. The
UN standard for demining landmines requires a 99.6% explosive ordnance detection rate for
humanitarian demining. However, this standard does not establish the level of false positives. Such
triggers are not dangerous for the life and health of the personnel performing demining, but they
significantly slow down and make the process more expensive. To overcome this shortcoming, the
use of a hybrid system of two autonomous devices is proposed. The first device is mobile and
allows for a quick processing of a large area, and the second.</p>
      <p>Since complex high-dimensional images are used to solve the segmentation problem, the problem
of a limited data set arises. Collecting data using one sensor is labor intensive, but adding a second,
different type of sensor increases the complexity of the process even more. Semi-supervised
learning is used to solve this limitation. Formally, this process is described as the trainig of an
approximator f ( x ,θ ) , where x ∈ X is the input space, y∈Y is the output space, θ are
approximator’s parameters, derived through a training process. The learning function is used to
learn the weights T ( f , L , U )=θ , where T is the training function, f – is the approximator, L –
is the labeled dataset, L=( x1, y1) ,… ,( xn , yn) , x ∈ X , y ∈Y , U is the unlabeled dataset,
U = x(n+ 1) , … xm , x ∈ X . It should be noted that the approximator obtained by means of
semisupervised learning does not differ from the approximator obtained using classical supervised
learning at the inference stage.</p>
      <p>When evaluating the effectiveness of the proposed solution, it is important to choose the right
metrics, as it is important to correctly classify the mine, and to avoid errors of the first kind (false
negatives).</p>
      <p>The following metrics will be used in this work:
Precision (Mine / No Mine) – the ratio of all true results to the total number of true and false-true
results: Precision= TP ;</p>
      <p>(TP + FP )
Recall (Mine / No Mine) – he ratio of all true results to the total number of true and false negative
results. When detecting mines, it is very important not to make mistakes of the first kind, which
makes recall one of the key metrics: Recall= TP</p>
      <p>(TP + FN )
of true positive and true negative predictions to all predictions: Rand =</p>
    </sec>
    <sec id="sec-6">
      <title>6. Method</title>
      <sec id="sec-6-1">
        <title>6.1. General description of the approach</title>
        <p>The proposed approach is based on the use of two devices for remote survey of the mined
area and detection of regions that contain mines. The first device is a highly mobile (quadrocopter),
the task of which is to detect areas with a high probability of finding explosive objects. For this, a
hyperspectral or multispectral camera is used, capable of detecting mines on the surface or in the
ground at a depth of up to 10 centimeters. At the same time, the intelligent system is highly
sensitive and allows a high number of errors of the second kind.</p>
        <p>To compensate for this, a second, less mobile, ground-based device is used. It is equipped
with a more sensitive sensor (lidar, ground-penetrating radar, magnetometers). At the same time,
its intelligent system is less sensitive, which significantly reduces the level of type II errors without
creating additional risk to personnel. This approach has following advantages:
1. This removes restrictions on the use of heavy and bulky sensors that cannot be installed on
a quadcopter;
2. This makes it possible to speed up the collection of data by a ground robot by optimal route
planning and scanning of areas with a high probability of exposure to explosive objects,
without visiting areas that were confirmed to not have any landmines;
3. This enables usage of a combination of several sensors. The specific types of sensors are
chosen depending on the task, but at the post-processing stage, the sensor fusion technique
can be used to obtain an even richer set of data.</p>
        <sec id="sec-6-1-1">
          <title>The general scheme of the system is shown in Figure 1.</title>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Data Processing</title>
        <p>A popular trend in hyperspectral data processing is a comprehensive data processing pipeline that
denoises, normalizes, filters, and (if necessary) reduces the dimensionality of the input data. This
work uses a hybrid approach with two intelligent systems that control the level of sensitivity
individually, a lightweight reprocessing pipeline is used. The pipeline is designed to create a
highquality contrast with minimal blurring or artifact introduction into the original image. The
proposed preprocessing consists of local-global contrast stretching via normalization. When
normalizing, a grayscale representation of pixels is used. After calculating the corresponding
coefficients, each channel is scaled by multiplying the value by the corresponding coefficient.
In the case when the elements of the scene are illuminated unevenly, this can lead to different local
contrast in the same objects. Since contrast strength is an important feature, it must be normalized.
For this, local normalization based on convolution is used:</p>
        <p>( I ( x , y)−minI ( x , y))
I n( x , y )=newMax∗
max I ( x , y)−minI ( x , y )
,
wher newMax is the maximum value after the normalization, I ( x , y) is pixel value x, y in the
original image, minI ( x , y ) is the minimum value for the convolution kernel in pixels x, y,
max I ( x , y ) is the maximum value for the convolution kernel in a pixel x,y.</p>
        <p>For local convolution, it is recommended to use a medium-sized window (this work uses a kernel of
25×25), however, specific settings depend on the data set.</p>
        <p>Local normalization is followed by global normalization (within a batch during training or a
window during inference). Global normalization is implemented as linear normalization, which
allows to reduce the density of the space of the input distribution:</p>
        <p>I N =( I −Min )
newMax−newMin + newMin ,</p>
        <p>Max−Min
where Min is the minimum brightness value in the original image, Max s the maximum brightness
value in the original image, newMax is the new maximum value in the image, newMin is is the new
minimum value in the image. Linear normalization uses standard parameters newMax = 255,
newMin = 0.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Convolutional Neural Networks for feature extraction and segmentation</title>
        <p>Artificial convolutional neural networks are used in the work to extract features and
perform segmentation. This type of neural networks makes it possible to flexibly process features
from high-dimensional data when solving a classification problem. This makes it possible to use the
same architecture for segmentation and feature extraction from a wide range of sensors that can be
reduced to a tensor format.</p>
        <p>The use of a homogeneous architecture significantly simplifies the learning process and also
expands the number of available types of sensors, as it allows for unified processing of both
lowdimensional and high-dimensional data.</p>
        <p>The classic Unet architecture with residual connections was chosen as the architecture of the
segmenter. The architecture of the neural network is shown in Figure 2.</p>
        <p>The proposed architecture consists of three components:
1. Input adapter – this block consists of several 3D convolution layers and brings the input
layer to a fixed dimension. The main task of this block is, first of all, spectral compression,
(1)
(2)
which highlights features while reducing the resolution of the input, so the input data is
processed incrementally by granular kernels.
2. Feature detection path – this block consists of the classical U-net architecture and contains
convolutional and sweeping paths. Each of these paths consists of three blocks of
convolution (or deconvolution) and feature detection. There are residual connections
between the corresponding blocks of each pathway, which prevents gradient attenuation
and stabilizes learning. The convolution block consists of a maximum pooling block and 2
consecutive convolution operations. A deconvolution block consists of two consecutive
convolution operations and one sweep operation.</p>
        <p>Segmentation Adapter – expands the resulting feature map to the target size, producing a
segmentation map that fits the size of the input. This layer also uses three-dimensional
convolutions to combine features from different channels into one, forming at the output a
segmentation map with the size H × W.</p>
      </sec>
      <sec id="sec-6-4">
        <title>6.4. Segmentator Training</title>
        <p>When creating a set of data for the proposed one, it is important to take into account the nature of
the task and the specificity of the data.</p>
        <p>To detect explosive objects, the images must have a high resolution, which makes the use of
orthophotomosaic segmentation techniques impractical. Recognition must take place at the level of
individual drone images, which reduces dimensionality, but increases the number of individual
samples that need to be labeled. The received data contains a high level of noise, which may have a
signature similar to the target objects. Also, the dataset itself is unbalanced and contains a large
number of "empty" images with noise and a small number of images with explosive objects. Taking
this into account, semi-supervised learning based on spatial consistency with proxy labeling and
modified loss functions and learning modes are used in training the segmenters.
This paper considers the proxy labeling method based on the principle of smoothness and
clustering. The idea is that when applying the clustering algorithm, all the pixels belonging to the
landmine class should belong to the same cluster, which allows the class label to be propagated
within the same image. The disadvantage of this approach is that the propagation is sensitive to
noise, and the transfer of labels between different images requires the application of complex
cluster similarity metrics, which significantly reduces the accuracy of pseudolabels. Therefore, in
this work, to compensate for these shortcomings, an additional assumption is introduced
temporal-spatial cluster consistency. The intuition is that by having a labeled image xt and the
unlabeled image x't+1, which have a certain intersection, both contain the object of interest. If the
direction of optical flow is known, an approximate location of the object of interest in x't+1 can be
derived by offsetting the location of the object in xt via the estimation of the camera’s movement.
Therefore, a distribution of the possible location of the object in the image with an estimation of
the probability based on the distance is created. Afterwards, clustering of the image is performed
and the intersection between the detected clusters and the object’s location probability distribution
is calculated, which is used to verify the accuracy of the pseudo labels. Visually, the idea is depicted
in Fig. 3
Analytically, the procedure is moderately complex. The first step is the calculation of the
probability map of object’s location. This is achieved by applying a convolution operator with a
Gaussian kernel is used, which is defined as:</p>
        <p>K Gaussian ( x , y )=
e
−(x2+ y2)</p>
        <p>2σ
2πσ2
,
This kernel is applied to the labeled image xt in order to generate a probability map pt. The next
step is to cluster the data of the unlabeled image xt+1. Any algorithm can be used for clustering,
KNN is used in this work. The clustering algorithm is applied for each spectral channel, forming
cluster masks mt_1 … mt_c.</p>
        <p>After calculating the cluster masks for each spectrum, the metric of the ratio of the noise to the
positive signal of the label of each of the clusters in the image is calculated:</p>
        <p>
          LS mt k ( x , y )=
∑x , y∈k ∑w∈k−Gaussian( x , y ) W∗ X t ( x , y )
∣k∣
(4)
where k – is a cluster of mt, w are pixel’s weight in the Gaussian kernel, xt is the labeled image. The
value of is bounded by [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] and does not require additional normalization.
        </p>
        <p>The noise ratio is calculated for each of the spectral channels, after which the spectral consistency
metric is calculated, on the basis of which a conclusion is made regarding the assignment of the
label to each of the pixels. Landmine features may be clearly present in only one (or several)
spectral channels, as such logarithmic transformation is used to significantly increase the influence
of high-impact channels on the pseudo-labeling results. Spectral consistency is calculated as:
SC t ( x , y )=
∑ log (1+ α∗LS mt k ( x , y ))
∣c∣∗log (1+ α )
,
(5)
where c is the number of spectral channels, is a scaling hyperparameter, recommended value is
[10, 1000], depdending on the number of channels.</p>
        <p>The last step of the algorithm is to cut off low-probability labels. For this, a standard cut-off process
is used based on the ConfidenceThreshold parameter (CT). Pseudolabels with a value of spectral
consistency less than CT are filtered out and not used in further training.</p>
        <p>The pseudo-labeling is performed iteratively. For each labeled image, where a landmine segment is
present, a breadth first search is used to check neighboring images in the order of optical flow and
propagate landmine segmentation pseudo labels. The propagation is peformed until the average
value of spectral consistency SCt is above the confidence threshold. This iterative process is
repeated for every labeled image with landmine segment present.</p>
        <p>An important component of system training is sensitivity adjustment during segmenter training.
The following features must be taken into account during training:
 explosive ordnance are a minority class in the data set, which leads to class imbalance;

one of the models should have high sensitivity and low specificity, the other one should
have medium sensitivity and high specificity, while the models have a dozen architecture
and learning method.</p>
        <p>Taking into account these conditions, the work uses two approaches to training segmenters.In both
cases, the same loss function is used – the binary weighted cross-entropy:</p>
        <p>LWBCE=− E [ W 1∙ y pred ] ∙ log ( y pred )+ W 0∙( 1− ytrue) ∙ log (1− y pred ) ,
(6)
where w1, w0 are the weights of positive and negative class correspondingly, ypred – is the
predicted value, ytrue– is the label value.</p>
        <p>When training a sensitive segmenter, more aggressive parameters - w1 = 0.8, w0 = 0.2 - are used,
which results in sensitivity loss, but in turn increases the recall. The model training is performed
on an unbalanced dataset using batch-normalization.</p>
        <p>When training a more specific classifier, less aggressive weights are used -w1 = 0.6, w0 = 0.4 - which
increases the precision of this classifier. However, during training, resampling is used, which
ensures the presence of explosive ordnance in 50% of the samples in the batch. This method of
resampling balances the training and allows to achieve high accuracy and specificity for the
validator sensor.</p>
        <p>When training neural networks, the Adam optimizer is used, the learning rate parameter is set to
0.001 with a stepwise decay and a minimum value of 0.0001.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Results</title>
      <p>The verification of the applied method was carried out by analyzing the effectiveness of the
application of the intelligent system based on experimental data. The experiment was conducted
using two agents: a drone with a multispectral camera and a cart with an infrared camera.
Data for the experiment were collected in the Khmelnytskyi region. TM-62 mine simulators were
installed on the site with a total area of 0.5 square kilometers. When the mines were installed, two
possible configurations were considered: installation directly on the surface and installation at a
depth of up to 10 centimeters. After installation, an unmanned aerial vehicle with a multispectral
camera was flown over, followed by a tour of places with known mine installations using a cart
with an infrared camera. The collection was carried out in the summer, in warm (up to +23 degrees
Celsius per day), sunny weather, which improves the quality of data collected using an infrared
camera. The grass cover was present, partially disturbed at the place where the mines were placed.
Two types of cameras were used for data collection – the multispectral camera built into the Mavic
3M drone captures 5 multispectral channels and the visible spectrum. The flight was carried out at
a height of 10 meters above the surface level. For the ground drone, a ZH20T thermal camera
mounted on a cart was used. The height of the camera during filming was up to 20 centimeters. To
capture footage, the camera must be directed perpendicular to the surface of the earth, which
makes navigation difficult.</p>
      <p>Pseudolabeling significantly increases the number of labeled samples without using a large
number of resources for labeling. As a result of marking, it was found that the marking efficiency
strongly depends on the features of the sensor. Thus, multispectral data contains several channels,
which makes spectral consistency an effective metric, while infrared data contains only 1 channel,
which significantly reduces the effectiveness of the consistency metric and requires a higher
threshold CT. The results of the pseudolabels used when initializing the segmenters are shown in
Table 1.
After pseudolabeling, models were trained in cross-validation mode, with a 90-10% split into
training and validation datasets. The confusion matrices of the segmentators are presented in Table
2, and the general learning metrics are presented in Table 3.
Overall, the approach works well, however semi-supervised learning could be tuned further. In
table 2, it can be seen that the validator agent has 1 false negative, which is not acceptable for
realworld condition. This false-negative was traced to a blurry image in the vicinity of a labeled
landmine, which was not properly labeled in the frame. This issue was resolved by just removing
the blurry iamge from a training set, however to adapt the method to the fluctuations in the input
data either a regularization or data filtering techniques should be explored further.
Conclusions
The proposed method allows to train two classifiers using the unified framework. The
semisupervised methodology boosts sample size by ~3 times for datasets used in the study. Achieved
results are in line with state-of-the-art methods and provides high recall and moderate precision at
0.694, which is further refined through the verification agent to achieve 0.93 precision.
The proposed method, however, is vulnerable to noise in the data which creates challenges during
the semi-supervised learning. As such, future research will be focused on stabilizing the
semisupervised learning either by adding extra filters to the input data to detect, remove or pre-process
low-quality samples or use regularization to ensure stable learning.</p>
      <p>Future research will be focused on improving upon two key parts of the proposed method, namely
semi-supervised learning for pseudolabeling and sensitivity training for different agents,
Pseudolabeling part struggles with rapid shifts of the scene, especially if motion blur is introduced.
Adding regularization parameter should stabilize the learning and improve generalization of the
sime-supervised step. When it comes to the sensitivity training for different agents, current
approach is able to utilize the same network architecture for both agents, with sensitivity tuning
for each agent independently. This leads to longer training time, which could be reduced by
adjusting training routine to re-use weights learned by one of the agents to train a second agent.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[11]
[12]</p>
      <p>Lisica, D. Evaluation on use of UAVs in-country assessment of suspected hazardous
areas in Bosnia and Herzegovina 2019. In Proceedings of the Norwegian People’s Aid
Workshop on Lessons Learned from the Use of Unmanned Aerial Vehicles for the
Identification and Assessment of Explosive Devices Threats, Podgorica, Montenegro, 16–17
October 2019.
[15]
[16]
[17]
[18]
[19]
[20]
[21]</p>
      <p>Sineglazov, V.M., Ischenko, V.P. Integrated navigation complex of UAV on basis of
flight controller. 2015 IEEE 3rd International Conference Actual Problems of Unmanned
Aerial Vehicles Developments, APUAVD 2015 - Proceedings, страницы 20–25, 7346547,
2015.</p>
      <p>Sineglazov, V.M. Computer aided-design problems of unmanned aerial vehicles
integrated navigation complexes. 2014 IEEE 3rd International Conference on Methods and
Systems of Navigation and Motion Control, MSNMC 2014 - Proceedings, страницы 9–14,
6979716,2014.</p>
      <p>Kamarudin, K., Mamduh, S. M., Shakaff, A. Y. M., Saad, S. M., Zakaria, A., Abdullah,
A. H., and Kamarudin, L. M. "Method to convert Kinect's 3D depth data to a 2D map for
indoor SLAM." in 2013 IEEE 9th international colloquium on signal processing and
its applications.</p>
      <p>Kneip, L., Tвche, F., Caprari, G., and Siegwart, R. "Characterization of the
compact Hokuyo URG-04LX 2D laser range scanner." in 2009 IEEE International
Conference on Robotics and Automation.</p>
      <p>Gibson J. M., Lockwood J. R. Alternatives for Landmine Detection. – Santa Monica,
CA: RAND Corporation, 2003. – XXX, 336 p.</p>
      <p>Zgurovsky, M., Sineglazov, V., Chumachenko, E. (2021). Classification and Analysis
of Multicriteria Optimization Methods. In: Artificial Intelligence Systems Based on Hybrid
Neural Networks. Studies in Computational Intelligence, vol 904. Springer, Cham.
https://doi.org/10.1007/978-3-030-48453-8_2</p>
      <p>Li N., Wang Z. Cheikh F. A. Discriminating Spectral–Spatial Feature Extraction for
Hyperspectral Image Classification: A Review // MDPI Sensors. – 2024. – 24, 2987. –
https://doi.org/10.3390/s24102987. – 32 p.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ismail</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elmogy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and ElBakry, H.,
          <article-title>"Landmines detection using low-cost multisensory mobile robot,"</article-title>
          <source>Journal of Convergence Information Technology</source>
          ,
          <year>2015</year>
          .
          <volume>10</volume>
          (
          <issue>6</issue>
          ): p.
          <fpage>51</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H. G.</given-names>
            <surname>Najjaran</surname>
          </string-name>
          ,
          <article-title>"Using genetic algorithms and neural networks for surface land mine detection,"</article-title>
          <source>IEEE Transactions on Signal Processing</source>
          ,
          <volume>47</volume>
          , pp.
          <fpage>176</fpage>
          -
          <lpage>186</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Moamen</given-names>
            <surname>Mohamed</surname>
          </string-name>
          , et al.
          <article-title>Autonomous Landmine Detection Robot Using SLAM navigation Algorithm</article-title>
          .
          <source>Journal of International Society for Science and Engineering</source>
          , Vol.
          <volume>4</volume>
          , No.
          <volume>4</volume>
          ,
          <fpage>81</fpage>
          -
          <lpage>91</lpage>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Anuar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Youssef</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Said</surname>
            ,
            <given-names>I. Chuan</given-names>
          </string-name>
          ,
          <article-title>"The Development of an Autonomous Personal Mobile Robot System for Land Mines Detection on Uneven Terrain"</article-title>
          ,
          <source>An Experience Conference on Intelligent Systems and Robotics</source>
          , u-tokyo.ac.jp,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gonzalez de Santos</surname>
          </string-name>
          ,E. Garcia,
          <string-name>
            <given-names>J.</given-names>
            <surname>Estremera</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Armada</surname>
          </string-name>
          ,
          <article-title>" A. DYLEMA: Using walking robots for landmine detection and location "</article-title>
          <source>International Journal of Systems Science, Informa UK Limited</source>
          ,
          <volume>36</volume>
          , PP.
          <fpage>545</fpage>
          -
          <lpage>558</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Robledo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carrasco</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Mery</surname>
          </string-name>
          ,
          <article-title>" A survey of land mine detection technology"</article-title>
          ,
          <source>International Journal of Remote Sensing</source>
          ,
          <volume>30</volume>
          ,PP.
          <fpage>2399</fpage>
          -
          <lpage>2410</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>NICOUD and M.HABIB</surname>
          </string-name>
          ,
          <year>1995</year>
          ,
          <article-title>"The Pemex-B autonomous demining robot: Perception and Navigation strategies"</article-title>
          ,
          <source>Proceedings of the International Conference on Intelligent Robots and Systems</source>
          ,
          <volume>1</volume>
          , pp.
          <fpage>419</fpage>
          -
          <lpage>424</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Santana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mestre</surname>
          </string-name>
          ,J.Lisboa,andl.Flores,
          <article-title>"Amultirobot system for landmine detection"</article-title>
          ,
          <source>10th IEEE Conference on Emerging Technologies and Factory Automation</source>
          ,
          <string-name>
            <surname>ETFA</surname>
          </string-name>
          , Vol.
          <volume>1</volume>
          , p.
          <fpage>8</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Mats</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Norwegian peoples IRAQ drone use and lessons learned. Norwegian People's Aid Workshop on Lessons Learned from the Use of Unmannedaerial Vehicles for the Identification and Assessment of Explosive Devices Threats</article-title>
          , Presentation, Podgorica,Montenegro,
          <fpage>16</fpage>
          -
          <lpage>17</lpage>
          October
          <year>2019</year>
          .
          <article-title>Toolbox Implementation for Removal of Anti-Personnel Mines, Sub-Munitions and UXO-TIRAMISU, EU FP7 Project 2012-2015</article-title>
          , Grant Agreement Number 284747. Available online: http://www.fp7-tiramisu.
          <source>eu/ (accessed on 9 January</source>
          <year>2021</year>
          ). Fardoulis,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Drones in HMA lessons from the field 2019</article-title>
          .
          <source>In Proceedings of the 7th Mine Action Technology Workshop</source>
          , GCIHD, Basel, Switzerland,
          <fpage>7</fpage>
          -
          <lpage>8</lpage>
          November 2019. Nevard,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Mansel</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Torbet,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Use of aerial imagery in urban survey &amp; use of RPASs in mine Action-Lessons learned from six countries</article-title>
          .
          <source>In Proceedings of the 7th Mine Action Technology Workshop</source>
          , GCIHD, Basel, Switzerland,
          <fpage>7</fpage>
          -
          <lpage>8</lpage>
          November
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>