<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Research on Object Recognition Approaches for Mobile Platforms with Limited Resources1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmytro Dovhal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natalia Myronova</string-name>
          <email>natali.myronova@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anzhelika Parkhomenko</string-name>
          <email>parkhomenko.anzhelika@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National University Zaporizhzhia Polytechnic</institution>
          ,
          <addr-line>64, Zhukovskogo str., Zaporizhzhia, 69063</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Applied Sciences and Arts</institution>
          ,
          <addr-line>23, Otto-Hahn str., Dortmund, 44227</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the results of a study of approaches to object detection for mobile platforms with lim ited computing resources. The research focuses on the development of software for real-time object identification using computer vision technologies optimised for embedded systems. The selected hardware platform is the ESP32-CAM, a low-power microcontroller with a built-in camera that allows for efficient video stream processing. The proposed approach involves the use of lightweight image processing meth ods and deep neural networks, in particular YOLO, adapted to work in resource-limited environments. Experiments confirm that the system can be implemented for real-world applications such as automated monitoring, security, and autonomous navigation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;object detection</kwd>
        <kwd>ESP32-CAM</kwd>
        <kwd>computer vision</kwd>
        <kwd>machine learning</kwd>
        <kwd>OpenCV</kwd>
        <kwd>Python</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Nowadays, the development of software for object detection systems for mobile platforms with
limited resources is driven by the growing demand for autonomous robotic systems in various
industries, such as security [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ], logistics [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], agriculture [
        <xref ref-type="bibr" rid="ref10 ref7 ref8 ref9">7, 8, 9, 10</xref>
        ], etc. In addition, such
systems have a dual purpose, being used both in civilian and military applications [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>In civilian purposes, they can be used for emergency response, agricultural automation and
intelligent surveillance. From a military perspective, autonomous platforms with limited resources
play an important role in reconnaissance, target detection and situational awareness on the
battlefield, where real-time data processing is critical [12]. Such mobile platforms equipped with
cameras and sensors are increasingly used to perform complex tasks such as obstacle detection,
mapping and environmental analysis [13, 14]. As these platforms often have limited resources,
especially in terms of computing power and energy consumption, it is important to develop
software solutions that allow for a sufficient level of object identification and recognition at a
minimal cost.</p>
      <p>The use of computer vision algorithms together with optimised software solutions ensures fast
and accurate data processing in real time [15]. This approach enables mobile platforms with limited
hardware resources to perform complex tasks such as object detection, obstacle recognition, and
navigation [16]. Software optimization for low-power devices allows for efficient image processing,
reducing CPU load and minimising power consumption.</p>
      <p>This enables autonomous systems to operate in challenging conditions, increasing the accuracy
and speed of performing various tasks. Therefore, the development of object detection software is a
pressing task.</p>
      <p>








</p>
      <p>The goal of the work is to research and implement object recognition algorithms for mobile
platforms with limited resources, allowing for real-time video stream processing with minimal
computing resources and a high degree of reliability.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Materials and methods</title>
      <sec id="sec-2-1">
        <title>2.1. Software architecture for mobile platforms</title>
        <p>According to the conducted research, the software architecture for mobile platforms (e.g., wheeled
robots, tracked or other autonomous vehicles) is usually built on a modular principle and consists
of the following main components [17]:</p>
        <p>Hardware level that provides the interface with sensors, actuators and other physical
components of the platform (controllers, power modules, servo motors, etc.).</p>
        <p>The system level or middleware is responsible for the integration of hardware and
software, providing standardized interfaces and data transfer (real-time operating system
FreeRTOS, Zephyr, RTEMS, robotics frameworks ROS/ROS2 [18, 19]).</p>
        <p>Control level responsible for decision-making and platform control (PID controllers for
movement, programming of the motion trajectory and obstacle avoidance).</p>
        <p>Localisation and navigation level, which provides platform location and trajectory planning
(localisation algorithms, Simultaneous Localization and Mapping (SLAM) and navigation,
such as algorithms A Star [20] and Dijkstra for route planning).</p>
        <p>
          Sensory data processing level - image processing using computer vision software solutions,
data filtering using the Kalman filter2[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], object recognition.
        </p>
        <p>Interaction level - development of interfaces for monitoring and managing the platform
(web or mobile applications, use of remote access interfaces, data exchange via MQTT
protocols) [22]).</p>
        <p>The generalised software architecture for mobile platforms is shown in Figure 1. This
architecture can be adapted to a specific mobile platform (ground drone, quadcopter, etc.).</p>
        <p>The main software components for mobile platforms (in particular, ground drones), necessary
for the implementation of autonomous navigation, detection and identification of objects, as well as
real-time data processing, are the following [17]:</p>
        <p>The motion control system is responsible for controlling the platform's motor functions,
including speed, direction and manoeuvring control, namely: trajectory planning is the
calculation of the optimal route based on set goals or received data; motion control is the control
of manoeuvres using feedback from sensor.</p>
        <p>The image processing module is used to collect and analyse visual information from
cameras installed on the platform in order to identify objects and detect possible obstacles.
Sensor integration module, which includes processing of data from various sensors such as
GPS, LiDAR, ultrasonic and inertial modules, to monitor and map the environment.
User interaction module that allows operators to remotely control the platform or
configure autonomous modes and is implemented through web interfaces, mobile applications or
specialised consoles.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Research on hardware aspects for mobile platforms with limited resources</title>
        <p>Mobile platforms with limited resources are the basis for building compact and affordable computer
vision systems. The comparative analysis of the most common platforms: ESP32-Cam, Raspberry Pi
4, Arduino Nano RP2040, and NVIDIA Jetson Nano [23, 24, 25, 26] with limited resources is shown
in Table 1.</p>
        <p>Platform</p>
        <p>CPU</p>
        <sec id="sec-2-2-1">
          <title>Memory</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Connection</title>
        </sec>
        <sec id="sec-2-2-3">
          <title>Availability</title>
          <p>camera
of Price</p>
          <p>OV2640
(1600×1200)</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>Raspberry Pi Camera (up to 12 Mp)</title>
        </sec>
        <sec id="sec-2-2-5">
          <title>Via additional modules $10-15 $35-70</title>
          <p>$10-20</p>
        </sec>
        <sec id="sec-2-2-6">
          <title>MIPI CSI-2 $99 ESP32-Cam</title>
        </sec>
        <sec id="sec-2-2-7">
          <title>Raspberry Pi 4</title>
        </sec>
        <sec id="sec-2-2-8">
          <title>Arduino Nano RP2040</title>
        </sec>
        <sec id="sec-2-2-9">
          <title>NVIDIA Jetson Nano</title>
        </sec>
        <sec id="sec-2-2-10">
          <title>Tensilica</title>
          <p>LX6</p>
          <p>ARM
CortexA72</p>
        </sec>
        <sec id="sec-2-2-11">
          <title>Dual CortexM0+</title>
        </sec>
        <sec id="sec-2-2-12">
          <title>Cortex</title>
          <p>A57, 128
CUDA
520 KB RAM
4 GB
LPDDR4</p>
          <p>Wi-Fi,
Bluetooth</p>
          <p>Wi-Fi,
Bluetooth,</p>
          <p>GPIO</p>
        </sec>
        <sec id="sec-2-2-13">
          <title>USB, GPIO</title>
        </sec>
        <sec id="sec-2-2-14">
          <title>Ethernet, GPIO The analysis showed that additional modules are needed to connect the camera to the Arduino Nano RP2040 platform, so to simplify the task it is advisable to consider the ESP32-Cam, Raspberry</title>
          <p>Pi 4 and NVIDIA Jetson Nano platforms for further research. The comparison of the characteristics
of the selected platforms [23, 24, 25, 26] with limited resources for implementing computer vision is
given in Table 2. According to the criterion “TensorFlow, PyTorch, OpenCV support” from Table 2,
the following conclusions can be made:


</p>
          <p>ESP32-Cam has limited support as TensorFlow Lite Micro only works with simplified
models while OpenCV is used for basic frame processing. This limitation is due to low comput
ing power and limited memory.</p>
          <p>Raspberry Pi 4 fully supports TensorFlow, PyTorch and OpenCV, enabling the
implementation of complex computer vision algorithms, including both training and inference.
NVIDIA Jetson Nano is the optimal platform for TensorFlow, PyTorch, and OpenCV with
its high pcomputing power and GPU hardware support for accelerating neural networks.</p>
          <p>Platform
ESP32-Cam</p>
          <p>Video stream- Energy
ing support
consump</p>
          <p>tion
160×120 - 0.6-0.9 W
1600×1200, up
to 30 FPS</p>
        </sec>
        <sec id="sec-2-2-15">
          <title>TensorFlow, Py- Application areas Torch, OpenCV support</title>
        </sec>
        <sec id="sec-2-2-16">
          <title>TensorFlow Micro, (limited)</title>
        </sec>
        <sec id="sec-2-2-17">
          <title>Lite IoT systems, surveillance,</title>
          <p>OpenCV portable monitoring
systems
Raspberry Pi 640×480 - 4К, 5-8 W
4 up to 60 FPS
NVIDIA Jet- 1080p - 4К, up 10-15 W
son Nano to 30 FPS</p>
        </sec>
        <sec id="sec-2-2-18">
          <title>TensorFlow, PyTorch, OpenCV</title>
        </sec>
        <sec id="sec-2-2-19">
          <title>TensorFlow, PyTorch, OpenCV</title>
        </sec>
        <sec id="sec-2-2-20">
          <title>Robotics, home automation, multimedia systems</title>
        </sec>
        <sec id="sec-2-2-21">
          <title>Computer vision, deep</title>
          <p>learning, complex robotics
systems</p>
          <p>As is known, the choice of platform depends on the project budget and requirements for
support machine learning algorithms.</p>
          <p>In terms of price, the ESP32-Cam is the cheapest platform, making it an attractive choice for
low-budget projects. The Raspberry Pi 4 is in the middle price range, offering flexibility and the
ability to implement more complex projects. The NVIDIA Jetson Nano is at the higher end of the
price range, but its high power justifies the cost for compute-intensive projects.</p>
          <p>Thus, ESP32-Cam is suitable for low-cost projects that require minimal image processing and
remote control. Raspberry Pi is suitable for more complex tasks that require high computing power
and the ability to work with large cameras. Jetson Nano is suitable for machine learning projects
that require high performance and real-time video processing.</p>
          <p>Therefore, the ESP32-Cam platform was chosen for the study, since it was necessary to
implement an object detection system with limited use of hardware resources. The ESP32-Cam
provides the required functionality through a compact design, energy efficiency and sufficient
computing power to perform basic computer vision tasks. In addition, its low cost and availability
make it an optimal choice for developing systems on a budget. The platform supports video stream
processing using the built-in OV2640 camera module and data transmission via Wi-Fi. This allows
integration with other systems, transmittion of video stream to the server for further processing
and implementation tasks that requiring minimal energy consumption.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Research of the features of integration of machine learning technologies for mobile platforms with limited resources</title>
        <p>At the research stage, the following technologies and frameworks were considered to solve the
project tasks: ESP-WHO (ESP-IDF), TensorFlow Lite, OpenCV.</p>
        <p>ESP-WHO (ESP-IDF) is a specialised computer vision framework developed by Espressif for use
on the ESP32 [27]. It is part of the ESP-IDF (Espressif IoT Development Framework) ecosystem and
provides integration with camera modules such as the OV2640 for basic image processing tasks.</p>
        <p>TensorFlow Lite is a simplified version of the TensorFlow framework optimized to run on
devices with limited resources such as mobile devices [28]. For microcontrollers, including ESP32,
the TensorFlow Lite for Microcontrollers version is used.</p>
        <p>OpenCV (Open Source Computer Vision Library) is a popular computer vision library that
supports a wide range of image and video operations. It is widely used for processing video
streams, object recognition and working with machine learning models.</p>
        <p>The results of the comparison of machine learning technologies and frameworks for mobile
platforms with limited resources are presented in Table 3.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related works</title>
      <p>The use of image processing approaches for object detection system for mobile platforms is an im
portant component, especially considering the limited resources of mobile platforms. The main
tasks include data preparation, pre-processing to improve image quality, and the use of object
detection algorithms.</p>
      <p>
        Data preparation for the machine learning model consists of frame size preparation and image
normalisation [29]. For most machine learning models to function properly, input images must be
of a fixed size. This is achieved by scaling the image, which allows to adjust the frames to the
required dimensions without losing importand information. Image normalization involves bringing
pixel values into the range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. This helps prevent large discrepancies between pixel values and
ensures stable performance of the machine learning model.
      </p>
      <p>Pre-processing to improve image quality includes the use of noise filtering and lighting
correction.</p>
      <p>Noise filtering reduces noise that can reduce recognition accuracy. The following filters can be
distinguished [30]:

</p>
      <p>The Gaussian filter is used to smooth images and reduce fine noise, workss based on a
Gaussian distribution, which determines the values of pixel depending on their neighbors.
Median filter is used to remove image noise while preserving sharp edges, and is especially
effective for images with point noise.</p>
      <p>Lighting correction involves adjusting the brightness and contrast of an image and is performed
using histogram equalization, a technique that increases contrast by evenly distributing pixel
intensities, thereby improving image quality.</p>
      <p>According to the conducted research, the most common algorithms for detecting objects in
images that can be implemented for mobile platforms are: Haar cascades [31, 32], Histogram of
Oriented Gradients (HOG), You Only Look Once (YOLO) [33], Single Shot Multibox Detector (SSD)
та Region-based Convolutional Neural Networks (Faster R-CNN).</p>
      <p>Haar cascades are a classical object recognition method [31, 32], that uses Haar-like features to
detect objects such as faces, cars, etc. The algorithm uses a cascade of simple classifiers that
sequentially filter the frame, quickly discarding unnecessary parts of the image.</p>
      <p>HOG is a method that detects objects by analysing histograms of oriented gradients in local
areas of an image. It is particularly effective at detecting people in the frame by recognizing
characteristic shape patterns.</p>
      <p>YOLO is one of the most popular object detection methods [33]. Unlike traditional methods such
as Haar cascades, YOLO analyses the image as a whole, dividing it into a grid and detecting several
objects simultaneously. This approach allows to quickly processing large images and work in real
time.</p>
      <p>SSD is another fast algorithm for real-time object detection. It is similar to YOLO, but uses
different sizes of “anchors” to detect objects. SSD is faster than most methods on medium and low
resolution images.</p>
      <p>Faster R-CNN is a more complicated model for object detection that uses regional proposals to
find potential objects. Compared to other methods, it has high accuracy, but requires significantly
more resources.</p>
      <p>The results of the comparison of object detection algorithms in images are presented in Table 4.</p>
      <p>To implement the object detection system on the ESP32-CAM platform, it was decided to
choose the YOLO algorithm due to its real-time processing speed. One of the main advantages of
YOLO is its ability to perform object detection in a single pass through the network, providing high
speed and real-time capabilities. This is extremely important for the ESP32-Cam, as the platform
has limited computing resources but requires the ability to process video streams with minimal
delays. Unlike other models such as Faster R-CNN that rely on pre-generated region proposals to
detect object, YOLO processes the image simultaneously without performing complex
preprocessing steps, speeding up the detection process, which is essential for running on the
ESP32Cam. Despite hardware limitations (520 KB of RAM, 4 MB of flash memory), ESP32-Cam can use
optimized versions of YOLO models such as Tiny YOLO or YOLOv4-tiny, which have a reduced
number of parameters and require less memory and processing power. One of the reasons for
choosing YOLO for the ESP32-Cam is the ability to use TensorFlow Lite Micro, which supports
simplified models including YOLO. This allows inference to be performed on microcontrollers with
limited resources. YOLO is also capable of detecting multiple objects simultaneously in a single
frame, which is important for many real-world applications such as monitoring multiple objects at
the same time. This allows ESP32-Cam to be used for tasks where it is necessary to simultaneously
identify different types of objects in a video stream.</p>
      <p>YOLO offers a wide range of tools and libraries for implementation on various platforms,
including support for libraries such as OpenCV, allowing for easy integration with platforms such
as the ESP32-Cam.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Development of an algorithm based on the YOLO computer vision model for ESP32-Cam</title>
      <p>Detection and recognition of objects in a real-time video stream using the YOLO computer vision
model was implemented with the ESP32-Cam module, with subsequent display of results on the
screen. The algorithm detects objects in the image, classifies them into categories and displays the
results as rectangles around the detected objects with corresponding labels. Below is a description
of the algorithm.</p>
      <p>Step 1. Importing and connecting libraries to a video stream.
import cv2
import numpy as np</p>
      <p>The OpenCV library cv2 is used to work with images and videos. The numpy library is required
to work with numeric arrays, which is necessary for processing data by a neural network.
url = "http://192.168.0.102:81/stream"
Next, the URL for connecting to the ESP32-Cam video stream is configured.</p>
      <p>Step 2. Loading the YOLO model.</p>
      <p>The pre-trained YOLO model is loaded, including the yolov3.weights file and the yolov3.cfg
configuration file. If the download is successful, a confirmation message is displayed, otherwise, an
error message is displayed on the screen.</p>
      <p>net = cv2.dnn.readNet(r"C:\Users\Dovgal
Dima\Desktop\esp32cam_video_stream_on_web_server\CameraWebServer\yolov3.weights",
 r"C:\Users\Dovgal
Dima\Desktop\esp32cam_video_stream_on_web_server\CameraWebServer\yolov3.cfg")
Step 3. Getting layer names.</p>
      <p>The names of all network layers are obtained and the output layers that will be used to process
the results are identified.</p>
      <p>layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
Step 4. Loading object classes.</p>
      <p>The list of object classes is loaded from coco.names. Each class corresponds to a type of object
that the model can recognize (e.g., person, car, etc.).</p>
      <p>with open(r"C:\Users\Dovgal
Dima\Desktop\esp32cam_video_stream_on_web_server\CameraWebServer\coco.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
Step 5. Opening the video stream.</p>
      <p>The connection is established with the video stream at the specified URL. The success of
opening the stream is verified.</p>
      <p>cap = cv2.VideoCapture(url)
Step 6. Reading and processing frames.</p>
      <p>While the stream is open, frames are continuously read from the video stream. If a frame cannot
be retrieved, the loop terminates.</p>
      <p>while cap.isOpened():
 ret, frame = cap.read()
Step 7. Frame preprocessing.</p>
      <p>The frame is converted into a format suitable for the neural network input. It is resized to
416x416 pixels and normalized.</p>
      <p>blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
Step 8. Object prediction.</p>
      <p>The input data is fed to the network, which returns the prediction of each frame.
net.setInput(blob)
outs = net.forward(output_layers)
Step 9. Processing predictions.</p>
      <p>The output data is processed to calculate the coordinates and dimensions of detected objects.
for out in outs:
for detection in out:
Step 10. Filtering and displaying results.</p>
      <p>Non-Maximum Suppression (NMS) is applied to remove redundant bounding boxes.
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
for i in indexes:
 x, y, w, h = boxes[i]
 label = str(classes[class_ids[i]])
 cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
 cv2.putText(frame, f"{label} {int(confidence * 100)}%", (x, y - 10), font, 1, color, 2)</p>
      <p>Bounding boxes with labels indicating class names and recognition accuracy are drawn around
detected objects.</p>
      <p>Step 11. Displaying frames on the screen.</p>
      <p>The processed frame is displayed on the screen. Pressing 'q' will exit the loop.
cv2.imshow("ESP32-CAM Object Detection", frame)
Step 12. Releasing computational resources.</p>
      <p>Upon termination, resources are released, the video stream is closed, and the display windows
are closed.</p>
      <p>cap.release()
cv2.destroyAllWindows()</p>
      <p>Thus, an algorithm has been developed consisting of the following main stages: loading the
model, processing the video stream and displaying the results. Based on this algorithm, a system
for recognizing real-time object on a video stream was implemented using the YOLO model.</p>
      <p>Additionally, a web application was developed using the Django framework to display the video
stream from the ESP32-CAM camera on a web page [34].</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and results</title>
      <p>The study of computing resources usage was conducted based on the following parameters:
 Frame processing time: during the test, the average time of one frame processing was
measured for two video stream resolutions (320×240 and 640×480).
 Memory usage: the RAM and flash memory usage of the ESP32-CAM was monitored.
 ESP32-CAM CPU load: to evaluate the performance, the CPU load was monitored for two
video stream resolutions and the average CPU load was determined (320×240 and 640×480).
 Power consumption: the power consumption was measured using a multimeter during
system operation, and the experiment showed that the average power consumption of the
system was 0.65W (at 5V power supply).</p>
      <p>To evaluate the accuracy of recognition of objects of a given class by the developed system,
experiments were conducted on data sets with various objects: simple objects (people) and complex
objects (small items, low-contrast objects). For simple objects, the recognition accuracy was 94%,
while for complex objects, the accuracy dropped to 77%, indicating the limitations of the model
when processing more complex types of data.</p>
      <p>Additional experiments were conducted to assess the impact of lighting conditions on the
recognition accuracy of both simple and complex objects. These experiments showed that as
illumination levels decreased, the accuracy of object recognition decreased.</p>
      <p>The results of the experimental research of the object detection system for mobile platforms
based on the ESP32-CAM are presented in Table 5. They confirmed that the system is capable of
performing basic functions with high accuracy and stability under normal operating conditions. All
core functions of the system, such as video stream capture, data transmission to the server, object
detection, and result visualization, work correctly and meet the specified requirements.</p>
      <p>Testing has shown that the system can recognize objects in stable lighting conditions at the
proper level, but in challenging conditions such as low light or noise, recognition accuracy
decreases.</p>
      <p>Testing was also conducted to assess the stability of the results, which confirmed that the
system demonstrates minimal deviations in results upon repeated testing (less than 1%). This
demonstrates a high level of stability and reliability of the software under the same input data
conditions.</p>
      <p>Testing the system's performance under weak Wi-Fi conditions showed that at reduced data
transfer rate (less than 5 Mbps), the system experiences frame transfer delays and periodic loss of
images, which reduces the efficiency of object recognition. This indicates that the system depends
on a stable Internet connection.</p>
      <p>Experimental studies of computing resources usage, have shown that ESP32-CAM has
limitations in terms of computing power. The system is capable of processing a video stream with a
resolution of 320×240, however, when working with higher resolutions (640×480), there is a
significant increase in CPU load and frame processing time. This indicates the need to optimise the
algorithm to make more efficient use of limited resources.</p>
      <p>The results of power consumption experiments showed that the average power consumption of
the system is 0.65W, which is low enough for autonomous operation. However, for long work
sessions, it is important to reduce power consumption through additional optimisation measures.</p>
      <p>CPU load of ESP32- Evaluation of CPU load during frame 320×240: 47.5 %
CAM processing 640×480: 48.54 %</p>
      <sec id="sec-5-1">
        <title>Power consumption</title>
      </sec>
      <sec id="sec-5-2">
        <title>Measurement of power consumption during system operation</title>
      </sec>
      <sec id="sec-5-3">
        <title>Average consumption: 0.65</title>
        <p>W
Accuracy of simple Evaluation of object identification in 94 %
object recognition the conditions of real data and
comparison of results with reference data
Accuracy of complex Evaluation of object identification in 77 %
object recognition the conditions of real data and
comparison of results with reference data</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>As a result of the work, a prototype of an object detection system was developed that allows
realtime object identification using computer vision algorithms for mobile platforms with limited
resources.</p>
      <p>An analysis of mobile platforms with limited resources, image processing methods and machine
learning technologies was conducted. Based on a comparative study of popular platforms,
ESP32Cam was chosen as the optimal option for developing an object detection system with limited
hardware resources, taking into account its low cost, compactness and energy efficiency.</p>
      <p>An application has been developed for the ESP32-Cam module that provides remote control of a
ground drone and real-time environmental monitoring. The system enables video streaming via
Wi-Fi, providing high image quality through the use of the OV2640 camera.</p>
      <p>The developed algorithm consists of the following main stages: loading the model, processing
the video stream and displaying the results is developed. Based on this algorithm, a system for
realtime object recognition was implemented using the YOLO model. A web application has been
developed using the Django framework that displays a video stream from the ESP32-CAM camera
on a web page.</p>
      <p>Testing and experimental study of the developed object detection system for mobile platforms
based on ESP32-CAM was carried out. It has been confirmed that the developed system has a
sufficient level of performance for use in stable conditions, however, further improvement is
required for operation in more challenging conditions (poor lighting, low data transfer rate).</p>
      <p>The scientific novelty of the work lies in the fact that using the YOLO model, an algorithm for
recognizing objects in real-time in a video stream with high recognition reliability has been
developed.</p>
      <p>The practical value of the results of the work lies in the fact that the developed software can be
used as a tracking and object recognition system for real-time environmental monitoring for
mobile platforms with limited resources.</p>
      <p>Future research will focus on further improving and expanding the system's functionality under
limited computing resources, optimizing resource usage, increasing recognition accuracy in
challenging conditions, and ensuring stable operation at low Wi-Fi signal levels.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work is partly carried out with the support of Erasmus + KA2 project WORK4CE “Cross-do
main competences for healthy and safe work in the 21st century”
(619034-EPP-1-2020-1-UAEPPKA2-CBHE-JP) and DAAD project ViMUk.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <sec id="sec-8-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[12] J. Gong, J. Yan, D. Kong, D. Li, Introduction to drone detection radar with emphasis on
automatic target recognition (ATR) technology, Electrical Engineering and Systems Science:
Signal Processing (2023) 1-17. doi:10.48550/arXiv.2307.10326.
[13] L. Nobile, M Randazzo., M. Colledanchise, L Monorchio., W. Villa, F. Puja, L. Natale, Active
exploration for obstacle detection on a mobile humanoid robot, Actuators 10(9) (2021) 1-20.
doi:10.3390/act10090205.
[14] L. Mochurad, Y. Hladun, R. Tkachenko, An obstacle-finding approach for autonomous mobile
robots using 2D LiDAR data, Big Data and Cognitive Computing 7(43) (2023) 1-16.
doi:10.3390/bdcc7010043.
[15] C. Zaharia, V. Popescu, F. Sandu, Hardware–software partitioning for real-time object
detection using dynamic parameter optimization, Sensors (Basel), 23(10) (2023) 1-25.
doi:10.3390/s23104894.
[16] A. Loganathan, N. S. Ahmad, A systematic review on recent advances in autonomous mobile
robot navigation, Engineering Science and Technology, an International Journal, 40 (2023)
126. doi:10.1016/j.jestch.2023.101343.
[17] H. Andreasson, G. Grisetti, T. Stoyanov, A. Pretto, Software architectures for mobile robots, in:
Ang, M.H., Khatib, O., Siciliano, B. (Eds.) Encyclopedia of Robotics, Springer, Berlin,
Heidelberg, 2023, pp.1-11. doi: 10.1007/978-3-642-41610-1_160-1.
[18] A. Eliasz, Zephyr RTOS embedded C programming, Apress Berkeley, CA, 2024, 677 p.</p>
        <p>doi:10.1007/979-8-8688-0107-5.
[19] G. Bloom, J. Sherrill, T. Hu, I. C. Bertolotti, Real-time systems development with RTEMS and
multicore processors, CRC Press, Boca Raton, 2020, 534 p. doi:10.1201/9781351255790.
[20] Y. Yan, Research on the A Star algorithm for finding shortest path, Highlights in Science</p>
        <p>Engineering and Technology 46 (2023) 154-161. doi:10.54097/hset.v46i.7697.
[21] D. Kalita, P. Lyakhov, Moving object detection based on a combination of Kalman filter and
median filtering, Big Data and Cognitive Computing, 6(4) (2022) 1-13. doi:10.3390/bdcc6040142.
[22] A. Aleesha, C. A. Laseena, MQTT protocol for resource constrained IoT applications: a review,
in: Proceedings of the International conference on Systems, energy and environment (ICSEE
2022), Kannur, India, 2022, pp. 1–7. doi:10.2139/ssrn.4299372
[23] H. Dietz, D. Abney, P. Eberhart, N. Santini, W. Davis, E. Wilson, M. McKenzie, ESP32-CAM as
a programmable camera research platform, Electronic Imaging 34 (2022) 232-1–232-6.
doi:10.2352/EI.2022.34.7.ISS-232.
[24] R. Chin, Spy camera DIY wireless using ESP32 CAM and Android, Independently published,
2022, 54 p.
[25] S. Smith, RP2040 Assembly language programming, Apress Berkeley, CA, 2022, 320 p.</p>
        <p>doi:10.1007/978-1-4842-7753-9.
[26] G. Parthasarathi, U. Prethashree, A. Harish, R. Moulieshwaran, M. Shunmugadinesh, Envision
– an object detection system using Jetson Nano, in: Proceedings of the 2nd International
conference on Inventive computing and informatics (ICICI), Bangalore, India, 2024, pp.
542545. doi:10.1109/ICICI62254.2024.00094.
[27] H. Fairhead, Programming the ESP32 in C using the Espressif IDF, I/O Press, 2024, 445 p.
[28] S. E. Adi, A. J. Casson, Design and optimization of a TensorFlow Lite deep learning neural
network for human activity recognition on a smartphone, in: Proceedings of the 43rd Annual
International conference of the IEEE Engineering in Medicine &amp; Biology Society (EMBC),
Mexico, 2021, pp. 7028-7031. doi:10.1109/EMBC46164.2021.9629549.
[29] J. Howse, Learning OpenCV 4 computer vision with Python 3, Packt Publishing, 2020, 372 p.
[30] K. Shi, Comparison of image enhancement algorithms based on denoising and edge detection,
Applied and Computational Engineering, 133 (1) 2025, 174-184.
doi:10.54254/27552721/2025.20700.
[31] L. Arreola, G. Gudiño, G. Flores, Object recognition and tracking using Haar-like features
cascade classifiers: application to a quad-rotor UAV, in: Proceedings of the 8th International
conference on Control, decision and information technologies (CoDIT), Istanbul, Turkey, 2022,
pp. 45-50. doi:10.1109/CoDIT55151.2022.9803981.
[32] S. Gharge, A. Patil, S. Patel, V. Shetty, N. Mundhada, Real-time object detection using Haar
cascade classifier for robot cars, in: Proceedings of the 4th International conference on
Electronics and sustainable communication systems (ICESC), Coimbatore, India, 2023,
pp. 6470. doi:10.1109/ICESC57686.2023.10193401.
[33] Z. Guan, Real time object recognition based on YOLO model, Theoretical and Natural Science
28(1) (2023) 137-143. DOI: 10.54254/2753-8818/28/20230450
[34] D. Dovhal, Object Detection, 2024. URL:
https://github.com/Dmitriy-1986/object_detection/tree/main.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>U.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , U.S. Medasetti,
          <string-name>
            <given-names>T.</given-names>
            <surname>Deemyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mashal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <article-title>Mobile robot for security applications in remotely operated advanced reactors</article-title>
          ,
          <source>Applied Sciences 14(6)</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          . doi:
          <volume>10</volume>
          .3390/app14062552.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.-N.</given-names>
            <surname>Pham</surname>
          </string-name>
          , D.-T. Mai,
          <article-title>A mobile robot design for home security systems</article-title>
          , Engineering,
          <source>Technology &amp; Applied Science Research</source>
          <volume>14</volume>
          (
          <issue>4</issue>
          ) (
          <year>2024</year>
          )
          <fpage>14882</fpage>
          -
          <lpage>14887</lpage>
          . doi:
          <volume>10</volume>
          .48084/etasr.7336.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Prykhodchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Couceiro</surname>
          </string-name>
          ,
          <article-title>People detection by mobile robots doing automatic guard patrols</article-title>
          ,
          <source>in: Proceedings of the 2020 IEEE International conference on Autonomous robot systems and competitions (ICARSC)</source>
          ,
          <source>Ponta Delgada, Portugal</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>300</fpage>
          -
          <lpage>305</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICARSC49921.
          <year>2020</year>
          .
          <volume>9096147</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kurniawan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A. S.</given-names>
            <surname>Gunawan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hartanto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mili</surname>
          </string-name>
          , W. Budiharto,
          <article-title>Swarm of mobile robots for security surveillance based on Android smartphone</article-title>
          and Firebase,
          <source>International Journal of Intelligent Systems and Applications in Engineering</source>
          <volume>11</volume>
          (
          <issue>4</issue>
          ) (
          <year>2023</year>
          )
          <fpage>810</fpage>
          -
          <lpage>815</lpage>
          . https://ijisae.org/index.php/IJISAE/article/view/3614.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Fragapane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.H.</given-names>
            <surname>Hvolby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sgarbossa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. O.</given-names>
            <surname>Strandhagen</surname>
          </string-name>
          ,
          <article-title>Autonomous mobile robots in hospital logistics</article-title>
          ,
          <source>in: Proceedings of the IFIP International conference on Advances in production management systems (APMS)</source>
          ,
          <source>Novi Sad, Serbia</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>672</fpage>
          -
          <lpage>679</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>030</fpage>
          -57993-7_
          <fpage>76</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lackner</surname>
          </string-name>
          , J. Hermann,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Palm</surname>
          </string-name>
          ,
          <article-title>Review of autonomous mobile robots in intralogistics: state-of-the-art, limitations and research gaps</article-title>
          ,
          <source>Procedia CIRP 130</source>
          (
          <year>2024</year>
          )
          <fpage>930</fpage>
          -
          <lpage>935</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.procir.
          <year>2024</year>
          .
          <volume>10</volume>
          .187.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Shamshiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Hameed</surname>
          </string-name>
          ,
          <article-title>Mobile robots for digital farming</article-title>
          , CRC Press,
          <year>2024</year>
          , 208 p. doi:
          <volume>10</volume>
          .1201/9781003306283.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>A review of the large-scale application of autonomous mobility of agricultural platform, Computers and Electronics in Agriculture 206 (</article-title>
          <year>2023</year>
          )
          <article-title>107628</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.compag.
          <year>2023</year>
          .
          <volume>107628</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Satoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Usami</surname>
          </string-name>
          .,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tatsumi</surname>
          </string-name>
          ,
          <article-title>A multi-purpose autonomous mobile robot as a part of agricultural decision support systems</article-title>
          ,
          <source>in: Proceedings of the 2023 IEEE 19th International conference on Automation science and engineering (CASE)</source>
          , Auckland, New Zealand,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . doi:
          <volume>10</volume>
          .1109/CASE56687.
          <year>2023</year>
          .
          <volume>10260483</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Yépez Ponce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Salcedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.D.</given-names>
            <surname>Rosero-Montalvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sanchis</surname>
          </string-name>
          ,
          <article-title>Mobile robotics in smart farming: current trends and applications</article-title>
          .
          <source>Frontiers in Artificial Intelligence</source>
          <volume>6</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          . doi:
          <volume>10</volume>
          .3389/frai.
          <year>2023</year>
          .
          <volume>1213330</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Koval</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Semenenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baranov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ostrovskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Akinina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Siechenev</surname>
          </string-name>
          ,
          <article-title>The role and place of robotic systems in modern wars and armed conflicts: theoretical aspect</article-title>
          ,
          <source>Social Development and Security</source>
          <volume>13</volume>
          (
          <issue>5</issue>
          ) (
          <year>2023</year>
          )
          <fpage>256</fpage>
          -
          <lpage>276</lpage>
          . doi:
          <volume>10</volume>
          .33445/sds.
          <year>2023</year>
          .
          <volume>13</volume>
          .5.24.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>