<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Augmented reality audiostream creation using CNN: boosting inclusion and safety for visually impaired people⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olexander Mazurets</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olena Sobko</string-name>
          <email>olenasobko.ua@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rostyslav Molchanova</string-name>
          <email>m.o.molchanova@gmail.com</email>
          <email>rostyslav0805@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maryna</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MoMLeT-2025: 7th International Workshop on Modern Machine Learning Technologies</institution>
          ,
          <addr-line>June, 14, 2025, Lviv-Shatsk</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In paper proposed approach for boost the inclusion and safety of visually impaired people by creating method of augmented reality audiostream creation which contains objects and named persons identified and classified from a video stream using CNN. Developed method differs from existing ones by two advantages: ability to classify named individuals in the video stream for boosting the inclusion of visually impaired people; and ability to consider of prioritizes dangerous objects when generating audiostream for boosting the safety of visually impaired people. By this, method is contributed to achieving UNDP SDG No. 3, No. 10 and No. 11. For research the method effectiveness, test software was developed. Obtained the applied comparison results of neural network models YOLOv5n, YOLOv5x, YOLOv8n, YOLOv8x, YOLOv11n and YOLOv11x revealed that the YOLOv11x model is better suited for objects identifying and classification in environment; obtained sufficient values of macrometrics for 80 objects classes: Accuracy 74.1%, Precision 73.74%, Recall 65.85%, mAP@0.5 71.34%, and mAP@0.5:0.95 54.85%. Obtained the applied results from CNN neural network model macrometrics indicate high-quality classification of named persons: Accuracy 97%, Precision 95%, Recall 96%, and F1-score 95%.</p>
      </abstract>
      <kwd-group>
        <kwd>augmented reality</kwd>
        <kwd>visually impaired people</kwd>
        <kwd>audiostream</kwd>
        <kwd>object detection</kwd>
        <kwd>YOLO</kwd>
        <kwd>CNN1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        According to the World Health Organization, there are about 2.2 billion people with visual
impairments in the world, and their number is increasing every year due to population aging and
insufficient access to health services in developing countries [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In Ukraine, there are no official
data on the number of people with visual impairments, but, according to various estimates, their
number varies from 70 to 300 thousand people [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The situation is complicated by military
aggression, which leads to an increase in the number of people with eye injuries. People with
visual impairments face a number of serious problems, especially with regard to orientation in the
environment. The main problems are difficulties with mobility, insufficient accessibility of the
urban environment, limitations in the perception of information and difficulties in social
interaction [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. One of the main problems is the difficulty in identifying physical obstacles. Due
to the lack of visual information, people with visual impairments are forced to rely on auditory
cues, tactile markers, and assistive devices such as canes or guide dogs [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>
        There are also challenges with the use of current technologies. For example, while mobile
applications based on artificial intelligence help
with navigation, they often have limited
functionality or low accuracy [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. At the same time, the implementation of such solutions in
everyday life can significantly improve the ability to be autonomous, providing timely information
about objects and obstacles in real time for people with visual impairments, for example, based on
portable devices such as GoogleGlass, smart canes, etc. [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>
        Other problems include psychological barriers [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], such as fear of new routes or public
transport, and a lack of specialists who can provide quality spatial orientation training. The latter is
especially relevant in the context of the growing number of people with visual impairments due to
injuries or diseases [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and has been complicated by the impact of quarantine restrictions [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]
and propaganda influence [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ], which has given special importance to the issue of inclusion of
people with visual impairments. Although measures are being taken at the state level to increase
inclusion for people with visual impairments, in particular, equipping the urban environment with
tactile elements, ensuring access to education, implementing programs to support employment and
legal protection, these efforts are still insufficient [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The essence of the problem lies in the lack
of accessibility of information technologies, the lack of modern equipment for the integration of
people with visual impairments into everyday life, as well as the uneven implementation of
inclusive solutions in different regions [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        The use of such technologies will contribute to increasing inclusion and safety for people with
visual impairments and creating an environment that takes into account the needs of people with
disabilities, and the created software will help achieve the UN Sustainable Development Goals
(SDGs), in particular SDG No. 3 «Promote healthy lives and well-being for all ages», SDG No. 10
«Reduce inequalities within and among countries» and SDG No. 11 «Ensure accessible, safe,
sustainable and environmentally sustainable cities and human settlements» [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In particular,
achieving SDG No. 3 through the development of information technologies that enhance the
mobility and autonomy of people with visual impairments will occur by improving their
psychoemotional state and physical health [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The achievement of SDG No. 10 is justified by the fact
that the integration of information technologies for people with visual impairments will reduce
inequalities in access to education, employment and social life [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The use of audio prompts and
automated navigation systems will help ensure safe movement in urban environments to achieve
SDG No. 11 [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>Research goal is to boost the inclusion and safety of visually impaired people by creating an
audio stream of objects and named persons identified and classified from a video stream using deep
convolutional neural networks.</p>
      <p>The main paper contribution is the proposed method of augmented reality audiostream creation
using CNN, which differs from existing ones in that when classifying objects in a video stream, it is
able to recognize named persons among them, which contributes to increasing the level of
inclusion. Also, thanks to the introduced system of listing priority objects by danger, the proposed
method can be used to increase the safety of people with visual impairments.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        The relevance of using information technology to improve inclusion and safety for people with
visual impairments is undeniable. Some researchers focus on integrating multisensory data,
combining images, sounds, and information from different sensors, creating a more complete
picture of the environment [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Such systems provide adaptive audio feedback according to the
users needs and conditions. Other researchers are working on developing deep learning-based
systems that use computer vision to recognize objects and create their audio descriptions [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The
following is a review of current research on the use of artificial intelligence tools and methods for
recognizing objects in the environment to ensure inclusion and safety for people with visual
impairments, as well as identifying named individuals in a video stream.
      </p>
      <p>
        A walking assistance system for people with visual impairments using XR glasses for safe
movement on the street is proposed in the study [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The mobile-optimized YOLOv8n model was
used for implementation, and three specialized models were developed for pedestrian paths,
transport infrastructure and obstacle recognition, which includes 9 classes in total. The average
metrics are Precision 87.05%, Recall 80.8%, F1-Score 84%.
      </p>
      <p>
        The authors of study [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] developed a mobile application that helps people with visual
impairments in their daily lives, increasing their autonomy, mobility and independence. The
authors used YOLOv5 for real-time object detection and Google ML Kit for text-to-speech
conversion. The YOLOv5 model is able to classify 7 classes of objects with an accuracy of 96%.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], an assistance system for visually impaired people is described, which combines an
obstacle detection algorithm with the use of sensors. The proposed system provides distance
measurement to objects and provides tactile feedback. An additional camera provides contextual
information about the environment using audio instructions. The system is able to recognize 9
objects, for which an average Accuracy of 91.7% was obtained.
      </p>
      <p>
        A voice application for smartphones was created in [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], which helps blind people in object
recognition and navigation in the environment. This system integrates text, people and object
recognition functions, and also uses sensors for obstacle detection, applying the K-nearest neighbor
algorithm for image processing.
      </p>
      <p>
        In paper [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], an automated object identification system for the blind is described, which uses
RFCN and Mask RCNN models for object detection. The RFCN model showed the best results with
an average exact match (mAP) of 0.825. The system uses a Raspberry Pi with a connected camera
and an ultrasonic sensor to determine the distance, and the information is transmitted to the user
via audio feedback.
      </p>
      <p>
        The authors of the article [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] presented an image-to-speech system for the blind, combining
OpenCV and YOLO to identify objects and create audio descriptions. The system achieves an
Accuracy of 96.60%, but the work only considers the recognition of a limited number of
environmental objects.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] presents a system for real-time object recognition using the YOLO algorithm optimized
for mobile devices. This system allows blind users to identify objects around them through
textaudio notifications, promoting their autonomy and reducing the need for help from other people or
special devices. The study [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] aims to develop and evaluate a prototype face recognition system
for use in the educational process. For implementation, an Android application was created that
classifies recognized faces. During the research, the authors obtained a face recognition accuracy
rate of 78.57%.
      </p>
      <p>
        In the paper [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], a new approach to face recognition in video surveillance systems for
university laboratories is proposed, aimed at solving the problems of low resolution and partial
face overlap. For this purpose, Retinaface-Resnet was reconstructed and combined with
QualityAdaptive Margin (adaface). Experiments showed an accuracy of 96.12% on the WiderFace set and
84.36% in real laboratory conditions.
      </p>
      <p>
        The authors of the study [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] proposed a new approach to face detection based on deep
learning, aimed at solving problems related to facial expressions, overlap and lighting changes. The
optimized ResNet-50 architecture with hyperparameter selection using the Gray Wolf algorithm
was used. 94% accuracy was achieved on both training and test data.
      </p>
      <p>
        A method combining LBPH and CNN for image preprocessing using equalized histograms was
proposed in the study [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. LBPH is used to extract and combine histogram values into a single
vector, which reduces training loss and increases accuracy to 96.5%.
      </p>
      <p>Analysis of recent studies allows us to conclude that there are unresolved problems on the way
to increasing the level of inclusion. In particular, the identification of named persons, which can
promote social communication and a sense of security, has not been implemented. Meanwhile,
existing solutions do not take into account priorities for identified environmental objects, although
the found objects in different situations may have diametric priorities.</p>
      <p>In this regard, the study aims to increase the level of inclusion by identifying named persons,
and makes an attempt to increase the level of safety of people with visual impairments by
introducing priorities for identified environmental objects.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>Artificial intelligence tools and methods allow us to create systems to support people with special
needs. One important direction is the creation of an augmented reality audio stream that can
increase inclusion and safety for people with visual impairments by analyzing the environment
through a video stream. Figure 1 proposes a general approach to converting input information in
the form of a video stream and a prioritized list of dangerous objects for people with visual
impairments into output information in the form of an audio stream.</p>
      <p>In accordance with the presented approach in Figure 1, a diagram of the method of augmented
reality audiostream creation using CNN is shown in Figure 2. As part of the conducted research,
the use of YOLO models for identification and classification of objects from the video stream and
CNN for classification of named individuals is proposed.</p>
      <p>The input data of the method are a video stream, a neural network model for object
identification, a priority list of dangerous objects for people with visual impairments, a delay in
voicing objects, as well as a confidence threshold for identifying objects in the video stream and a
neural network model for classifying named persons. The video stream provides dynamic visual
information about the environment in which a person with visual impairments is located, and
neural network models process this video stream to identify and classify objects according to a set
confidence threshold from which an audio stream is formed, which is relevant for people with
visual impairments.</p>
      <p>
        The first stage of the method is the pre-processing of the video stream. Preprocessing for the
YOLO neural network model [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] includes preparing data for processing, in particular scaling
frames to a single size that varies depending on the versions, but for most it is 640x640,
normalizing pixel values to reduce variability and converting images to the appropriate RGB color
space. After that, the data is converted into tensors compatible with the requirements of the neural
network model.
      </p>
      <p>The second stage includes determining the boundaries of objects on the received and
preprocessed in stage 1 frames of the video stream and estimating the probability of their belonging to
a certain class. YOLO performs classification of objects in the frame by one-stage detection,
dividing the image into an S×S grid. The grid size depends on the YOLO versions and the detection
level and can be: 80×80, 40×40, 20×20 cells, where each cell is responsible for predicting bounding
boxes and class probabilities for objects located within its boundaries. The neural network
architecture simultaneously calculates the coordinates of the frames, the degree of confidence in
the presence of the object and its class affiliation, using a forward pass. After that, confidence
thresholds and the Non-Maximum Suppression algorithm are applied to filter out bounding boxes
and leave only the most relevant ones. Using the confidence threshold set by the user (from 0.0 to
1.0), filtering of objects occurs: only those whose probability exceeds this threshold will be selected
for output to the audio stream in the following stages. If an object is detected in the frame that is
identified as a person with an additional threshold value above 0.6, an additional determination of
this person is performed at stage 3.</p>
      <p>In the third stage, the named persons are classified. First, the boundary of the identified person
in the frame is determined, after which the probability with which this person belongs to a specific
class is calculated. The class of the named person in the next stage will be announced as "person
is"+name of person. If the person could not be identified, his name will not be announced in stage
4, but "unknown person" will be announced instead. Using YOLO to identify named persons is not
advisable, since this approach is optimized for fast detection of object classes, and not for accurate
recognition of specific persons. YOLO performs well in identifying people in the frame, but its final
layers are not designed to extract unique features necessary for identifying and classifying named
persons. For the identification and classification of named persons, a CNN neural network model is
used in the study. The steps involved in face identification using a cascade classifier based on Haar
functions and classification of a named person using it using CNN are shown in Figure 3.</p>
      <p>
        To classify a person detected in a frame from a video stream, a cascade classifier based on Haar
functions is used to determine the boundaries of the face [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]. As input parameters, the cascade of
classifiers receives the coordinates of the bounding box of the person that was classified by YOLO.
This allows localizing the area containing the persons face, which is necessary for further
classification of named persons. If the cascade classifier does not determine that there is a face in
the bounding box of the person (for example, the person in the frame has his back turned, or the
face is at an angle that does not allow him to be identified), then the classification of named
persons does not occur, and the class name «person» is provided in the audio output. After
isolating the area with the face, a CNN is used to identify the person by classifying the resulting
image. The CNN generates a vector representation of the face and compares it to the named
persons trained by the classifier to determine the most likely class match. This is done by using a
softmax function that converts the model's output values into probabilities of belonging to each
class. The maximum value of the softmax function will correspond to the identified class of the
named person, which generates audio output indicating the name of the recognized person.
      </p>
      <p>The fourth stage involves the formation of an audio stream. The classes of objects from the
video stream are sorted for voiceover according to the defined priorities of danger for visually
impaired people, as well as with identified named persons. It is important to note that if several
objects with the same priority are classified in the frame, they will be voiced in the order in which
YOLO classifies them. If the user has set a priority for the "person" class, then all persons
(including named persons) will be voiced according to this priority. If there are several named
persons in the frame, they will be voiced in the order of their classification. If the priority for the
"person" class is not set, named persons will be voiced after all objects with a higher priority, in the
order in which they were classified. The voiceover update frequency is set by the user to prevent
excessive load on the perception of information. Since this study is experimental in nature, the
audio stream is updated within a 5-second interval, which can be changed by the user if necessary,
and the output of classified objects to the application log has no delays. As part of the study, the
following list of priority objects for visually impaired people was determined: bicycle, car,
motorcycle, bus, train, truck, traffic_light, stop_sign. The specified list is purely experimental and
can be changed or supplemented if necessary.</p>
      <p>The initial data of the method are classes of objects with an assessment of the confidence of
their belonging to the corresponding classes, classes of classified named persons, if they have been
identified, as well as an audio stream formed on the basis of the priority of dangerous objects for
visually impaired people, and containing the specified classes of objects and named persons, if such
are classified.</p>
      <p>Thus, the presented method of augmented reality audiostream creation using CNN can
contribute to increasing inclusion and safety for people with visual impairments.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>To test the presented method, it is necessary to study the existing pre-trained YOLO neural
network models, as well as to investigate the influence of the batch size and the number of epochs
when training the CNN model on the accuracy of the classification of named persons by test
software (Figure 4).</p>
      <p>а)
b)</p>
      <p>
        In the process of experimental research, test software was created for creating an augmented
reality audio stream in Python (Figure 4), which allowed testing the proposed method of
augmented reality audiostream creation using CNN.To implement this software prototype, the
following Python libraries were used: «tensorflow» [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ], «pandas» [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ], «PyQt5» [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ], «numpy»
[
        <xref ref-type="bibr" rid="ref40">40</xref>
        ], «ultralytics» [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ]. All experiments were performed on a computer with following
configuration: Intel Core i5-11400H processor, 16 GB DDR4 (3200MT/s) RAM and NVIDIA GeForce
RTX 3050 Laptop graphics processor.
      </p>
      <p>
        To study the models, the COCO dataset was used, which contains 330,000 images (of which over
200,000 are annotated), 80 object classes [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ], which was tested for balance and
representativeness [43]. The results of experiments using developed test software are presented in
the next section.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>To study of object classification at stage 2 of developed method, the YOLOv5n, YOLOv5x,
YOLOv8n, YOLOv8x, YOLOv11n, and YOLOv11x models were compared, which are pre-trained to
classify 80 environmental objects, including, for example: car, motorcycle, traffic light, bench, dog,
backpack, tennis racket, apple, chair, toothbrush, etc. The models were compared using the metrics
Accuracy, Precision, Recall, etc. [44, 45]. Pre-trained objects can be divided into the following
categories: people, transport, infrastructure, animals, personal belongings, sports equipment, dishes
and kitchen items, food, furniture, household appliances, stationery. Figure 5 shows the results of
the macrometrics Accuracy, Precision, Recall, mAP@0.5, mAP@0.5:0.95 for each of the tested
YOLO models.</p>
      <p>The study found that YOLOv11X has the best metrics for classifying objects in a video stream.
In particular, Accuracy 74.1%, Precision 73.74% and Recall at 65.85% indicate that the model can
correctly identify objects. The obtained mAP@0.5 indicator (average accuracy at the IoU threshold
of 0.5) is 71.34%, and mAP@0.5:0.95 has a value of 54.85%. The obtained results of these indicators
indicate that the model finds objects well even with a small error in positioning. In modern
computer vision systems aimed at helping people with visual impairments, one of the key
indicators is the speed of real-time image processing. In particular, the frame rate per second (FPS)
determines how quickly the model can analyze the environment and provide the user with relevant
information. High FPS is important for ensuring continuous and accurate object recognition,
especially in changing environmental conditions. As part of the study, FPS was calculated for the
YOLOv5x, YOLOv8x, YOLOv11x models, which allows us to assess their suitability for real-time
application. The models were tested on a video [46] containing scenes with people and cars. The
results of FPS calculations are presented in Table 1.</p>
      <p>The YOLO model test results show a noticeable increase in FPS when moving to newer
versions. In particular, the YOLOv5x model has the lowest average FPS (143.47), which means a
longer frame processing time, while YOLOv8x improves this figure to 155.32 FPS, reducing the
total processing time by 0.4 seconds. The best results are shown by YOLOv11x, which reaches
182.66 FPS, which is 27% faster than YOLOv5x and almost 18% faster than YOLOv8x.</p>
      <p>The reduction in the total frame processing time from 5.23 seconds (YOLOv5x) to 4.11 seconds
(YOLOv11x) indicates a significant increase in efficiency, which is critical for real-time object
recognition, especially for systems aimed at assisting the visually impaired, where even small
delays can affect safety and usability.</p>
      <p>The FPS indicators given in Table 1 were calculated during the processing of the video stream
by YOLO models without displaying the image on the screen and without visualizing the
superimposed bounded boxes. At the same time, for the correct display of the video in the software
prototype together with the visualization of the superimposed bounded boxes of the detected
objects, the optimal frame rate of 30 FPS was empirically determined, which can be changed in the
settings depending on the user's requirements and the computing resources of the system.</p>
      <p>Considering the conducted studies, YOLOv11x is the best of the tested models, providing the
highest indicators of Accuracy, Precision, Recall, mAP@0.5, mAP@0.5:0.95, FPS and the lowest
total frame processing time, therefore, the micrometric indicators of the YOLOv11x model were
further investigated in more detail. Figure 6 shows the micrometric indicators for each class of only
the priority objects in terms of danger for people with visual impairments, which were listed in
section 3.</p>
      <p>The evaluation of the YOLOv11X model performance indicates its balanced performance in
classifying objects of different classes. The Precision value is stable for most classes, approaching
73.74%, which indicates the overall high ability of the model to correctly classify objects without a
significant number of false positives. However, the Recall indicator shows significant variability,
which may indicate the heterogeneous ability of the model to find all objects of certain categories
in the input images.</p>
      <p>The lowest Recall level is observed for the «bicycle» class (36.06%), which indicates the
difficulty of model in detecting this type of object due to the peculiarities of their shape, size, or the
presence of partial overlap with other objects. At the same time, the «train» class (75.13%) shows
the highest Recall indicator, which indicates a high level of detection of this object in the test set.</p>
      <p>As can be seen from Figure 6, the YOLOv11X neural network model demonstrates
aboveaverage prediction accuracy for each class, but for some classes, such as «bicycle» and «traffic
light», it demonstrates lower values than for other classes, which is associated with the complexity
of their detection. However, if we evaluate the model in the context of 80-class classification, the
calculated indicators are good.</p>
      <p>For the classification of named persons, the CNN neural network model was trained to identify
19 named persons and a separate class was «other», in which photos of different persons were
mixed, which were taken in the amount of 50 photos from the dataset [47]. To train the neural
network model, 50 face frames were taken from the video stream for each person in different
positions using the software prototype interface. The total number of samples is 1000 images and
all classes were used equally in the training process.</p>
      <p>The influence of batch size and number of epochs when training the CNN model on the
accuracy of classification of named persons is given in Table 2. The following macrometric
indicators were obtained: Accuracy 97%, Precision 95%, Recall 96%, F1-score 95%.</p>
      <p>Micrometrics Accuracy, Precision, Recall, F1-score are given in Table 3.</p>
      <p>According to the results presented in Table 3, the optimal values of batch size are 32 and the
number of epochs is 16, since with these parameters the highest classification accuracy of 97% is
achieved.</p>
      <p>
        The results of the CNN neural network model for the classification of named persons
demonstrate high accuracy. In particular, the micrometrics Accuracy, Precision, Recall and F1-score
indicate sufficient accuracy and completeness of recognition for each class. The obtained neural
network model has higher accuracy rates, compared to [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], where the Accuracy rate reaches
96.12%, as well as compared to [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] with an Accuracy rate of 94% and [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] with an Accuracy rate of
96.5%.
      </p>
      <p>The limitations of the used CNN model are that it demonstrates a decrease in the level of
confidence in the classification of named persons with a significant head rotation, in particular, if
the face is in a 3/4 angle or in profile. In such cases, the model cannot identify a person due to the
lack of a sufficient number of recognition features. Also, glasses and masks reduce the
classification confidence score, but headgear does not affect the classification accuracy.</p>
      <p>Another major limitation is the lack of an algorithm for measuring the distance to objects in the
frame. Since the model analyzes two-dimensional images, there is a problem with correctly
identifying objects that are either too close or too far from the camera. Also, due to the distortion
of the perspective of objects in the frame and the loss of details, such objects may be incorrectly
classified or even missed. For example, a car that is at a considerable distance may be recognized as
a different class due to its reduced size and distortion of proportions. An example of incorrect
classification of a car is shown in Figure 7. This is also especially critical for named persons, since
the accuracy of their classification directly depends on the quality and size of the face in the frame.
а)
b)</p>
      <p>Another limitation is the frame refresh rate and the creation of the current audio stream. The
delay between frame processing can cause situations where the object has already left the field of
view, but its sound is still being played. This can create problems for users, since the audio stream
will not be synchronized with the current situation in the environment.</p>
      <p>The prospect of further research is to study the method in real conditions to ensure the optimal
delay of the audio stream output. As well as the use of sensors to estimate the distance to objects,
this will compensate for the scaling problem to avoid false classifications for objects that are too
close or far from the device camera.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Method of augmented reality audiostream creation using CNN is proposed, which can increase
inclusion and safety for people with visual impairments due to additional classification of named
persons and priority audio output of dangerous objects into the audio stream.</p>
      <p>To identify objects in the video stream, the YOLOv11x neural network model was used, which
allows in real time by analyzing the video stream received by the camera, to identify and classify
objects from the environment, both outdoors and indoors. To classify named persons identified by
the YOLOv11x model, a cascade classifier for face identification and a CNN neural network model
trained for multi-class classification of named persons were used.</p>
      <p>For applied research of the method proposed in paper, test software was developed and
comparison of neural network models YOLO5n, YOLOv5x, YOLOv8n, YOLOv8x, YOLOv11n, and
YOLOv11x was performed. By analyzing the obtained results, namely the macrometrics Precision,
Recall, mAP@0.5, mAP@0.5:0.95 and FPS, it was determined that the YOLOv11x model is better
suited for identifying the classification of objects in the environment, as it received higher
indicators. In particular, the Accuracy value is 74.1%, Precision is 73.74%, which indicates the high
accuracy of the model in assigning the detected objects to the correct classes. Recall, which is
65.85%, reflects the ability of the model to find a significant proportion of all objects present,
although some of them may remain undetected. The mAP@0.5 value, which is 71.34%, indicates the
overall quality of the model at a relatively low overlap threshold, and mAP@0.5:0.95, which is
54.85%, demonstrates its accuracy at different overlap thresholds. The calculated FPS of the
YOLOv5x, YOLOv8x, YOLOv11x models also indicates that YOLOv11x, compared to others, is able
to process the same number of frames in a shorter period of time.</p>
      <p>The obtained results of macrometrics and micrometrics Accuracy, Precision, Recall and F1-score
of the CNN neural network model for the task of classification of named persons demonstrate high
classification accuracy. Macrometrics (Accuracy 97%, Precision 95%, Recall 96%, F1-score 95%)
indicate a general high accuracy of classification of named persons. Minor variations in the
indicators for individual classes indicate a minimum number of errors, which does not significantly
affect the overall accuracy of classification of named persons.</p>
      <p>Proposed method has limitation in that there is no algorithm for measuring the distance to
objects, which leads to a decrease in the accuracy of their identification and classification,
especially for distant objects. In addition, limited frame refresh rate can cause delay in the audio
stream, due to which objects are voiced with delay or after they disappear from the view field.</p>
      <p>Therefore, the method presented in the paper can increase the inclusion and safety of people
with visual impairments. Developed method differs from existing ones by able to classify named
individuals in the video stream for boosting the inclusion of visually impaired people; also, method
prioritizes dangerous objects when generating audiostream for boosting the safety of visually
impaired people. Considering the above, the method of augmented reality audiostream creation
using CNN will contribute to achieving UNDP SDG No. 3, No. 10 and No. 11.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[43] O. Sobko, O. Mazurets, M. Molchanova, I. Krak, O. Barmak, Method for analysis and formation
of representative text datasets, CEUR Workshop Proceedings, 3899 (2024) 84-98. URL:
https://ceur-ws.org/Vol-3899/paper9.pdf.
[44] Y. Krak, O. Barmak, O. Mazurets, The Practice Investigation of the Information Technology
Efficiency for Automated Definition of Terms in the Semantic Content of Educational
Materials, CEUR Workshop Proceedings, 163 (2016) 237–245. URL:
doi:10.15407/pp2016.0203.237.
[45] O. Zalutska, M. Molchanova, O. Sobko, O. Mazurets, O. Pasichnyk, O. Barmak, I. Krak, Method
for Sentiment Analysis of Ukrainian-Language Reviews in E-Commerce Using RoBERTa
Neural Network, CEUR Workshop Proceedings, 3387 (2023) 344-356. URL:
https://ceurws.org/Vol-3387/paper26.pdf.
[46] TechChannel00001, YouTube Video, 2025. URL:
https://www.youtube.com/watch?v=Gr0HpDM8Ki8.
[47] Kaggle, LFW People Dataset, 2025. URL:
https://www.kaggle.com/datasets/atulanandjha/lfwpeople.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] WHO, Blindness and
          <string-name>
            <given-names>Visual</given-names>
            <surname>Impairment</surname>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://www.who.int/news-room/factsheets/detail/blindness-and
          <article-title>-visual-impairment.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <fpage>300</fpage>
          ,
          <string-name>
            <surname>Useful</surname>
            <given-names>Information</given-names>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://0300.com.ua/cikave/useful-info.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. W.</given-names>
            <surname>Shaikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-A.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hsu</surname>
          </string-name>
          , Navigating
          <string-name>
            <surname>Real-World Challenges</surname>
          </string-name>
          :
          <article-title>A Quadruped Robot Guiding System for Visually Impaired People in Diverse Environments</article-title>
          ,
          <source>In: CHI '24: CHI Conference on Human Factors in Computing Systems</source>
          , ACM, New York, NY, USA,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1145/3613904.3642227.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dembitska</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sivert</surname>
          </string-name>
          ,
          <article-title>Digital accessibility in education: challenges and prospects</article-title>
          ,
          <source>Health Saf. Pedagog. 9</source>
          .
          <issue>2</issue>
          (
          <year>2024</year>
          )
          <fpage>57</fpage>
          -
          <lpage>63</lpage>
          . doi:
          <volume>10</volume>
          .31649/
          <fpage>2524</fpage>
          -1079-2024-9-2-
          <fpage>057</fpage>
          -063.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. M. R.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. L.</given-names>
            <surname>Due</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Lüchow,</surname>
          </string-name>
          <article-title>The eye at hand: when visually impaired people distribute seeing with sensing AI, Vis</article-title>
          . Commun. (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1177/14703572241227517.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L. R.</given-names>
            <surname>d</surname>
          </string-name>
          . Souza, R. Francisco,
          <string-name>
            <given-names>J. E. d. Rosa</given-names>
            <surname>Tavares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L. V.</given-names>
            <surname>Barbosa</surname>
          </string-name>
          ,
          <article-title>Intelligent environments and assistive technologies for assisting visually impaired people: a systematic literature review</article-title>
          ,
          <source>Univers. Access Inf. Soc.</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1007/s10209-024-01117-y.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavric</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Beguni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Zadobrischi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.-M. Cailean</surname>
            ,
            <given-names>S.-A.</given-names>
          </string-name>
          <string-name>
            <surname>Avatamanitei</surname>
            ,
            <given-names>A Comprehensive</given-names>
          </string-name>
          <article-title>Survey on Emerging Assistive Technologies for Visually Impaired Persons: Lighting the Path with Visible Light Communications</article-title>
          and
          <source>Artificial Intelligence Innovations, Sensors</source>
          <volume>24</volume>
          .15 (
          <year>2024</year>
          )
          <article-title>4834</article-title>
          . doi:
          <volume>10</volume>
          .3390/s24154834.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Moram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zahruddin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Multifunctional Assistive Smart Glasses for Visually Impaired</article-title>
          ,
          <source>SN Comput. Sci. 6</source>
          .
          <issue>2</issue>
          (
          <year>2025</year>
          ).
          <source>doi:10.1007/s42979-025-03701-2.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehendale</surname>
          </string-name>
          ,
          <article-title>Review of sensor-driven assistive device technologies for enhancing navigation for the visually impaired</article-title>
          ,
          <source>Multimedia Tools Appl</source>
          . (
          <year>2023</year>
          ).
          <source>doi:10.1007/s11042-023-17552-7.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z. J.</given-names>
            <surname>Muhsin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Qahwaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ghanchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Al-Taee</surname>
          </string-name>
          ,
          <article-title>Review of substitutive assistive tools and technologies for people with visual impairments: recent advancements and prospects</article-title>
          ,
          <source>J. Multimodal User Interfaces</source>
          (
          <year>2023</year>
          ).
          <source>doi:10.1007/s12193-023-00427-4.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>I.</given-names>
            <surname>Krak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sobko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Molchanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Tymofiiev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mazurets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Barmak</surname>
          </string-name>
          ,
          <article-title>Method for neural network cyberbullying detection in text content with visual analytic</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          ,
          <volume>3917</volume>
          (
          <year>2025</year>
          )
          <fpage>298</fpage>
          -
          <lpage>309</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3917</volume>
          /paper57.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Krak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Zalutska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Molchanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mazurets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bahrii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sobko</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Barmak</surname>
          </string-name>
          ,
          <article-title>Abusive speech detection method for Ukrainian language used recurrent neural network</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          ,
          <volume>3688</volume>
          (
          <year>2024</year>
          ) pp.
          <fpage>16</fpage>
          -
          <lpage>28</lpage>
          . doi:
          <volume>10</volume>
          .31110/COLINS/2024-3/002.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kovalchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Slobodzian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sobko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Molchanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mazurets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Barmak</surname>
          </string-name>
          , I. Krak,
          <string-name>
            <given-names>N.</given-names>
            <surname>Savina</surname>
          </string-name>
          ,
          <source>Visual Analytics-Based Method for Sentiment Analysis of COVID-19 Ukrainian Tweets, Lecture Notes on Data Engineering and Communications Technologies</source>
          ,
          <volume>149</volume>
          (
          <year>2023</year>
          )
          <fpage>591</fpage>
          -
          <lpage>607</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -16203-9_
          <fpage>33</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>I.</given-names>
            <surname>Krak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Didur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Molchanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mazurets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sobko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Zalutska</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Barmak</surname>
          </string-name>
          ,
          <article-title>Method for political propaganda detection in internet content using recurrent neural network models ensemble</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          ,
          <volume>3806</volume>
          (
          <year>2024</year>
          )
          <fpage>312</fpage>
          -
          <lpage>324</lpage>
          . URL: https://ceur-ws.org/Vol3806/S_36_Krak.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Krak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Zalutska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Molchanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mazurets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Manziuk</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Barmak</surname>
          </string-name>
          ,
          <article-title>Method for neural network detecting propaganda techniques by markers with visual analytic</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          ,
          <volume>3790</volume>
          (
          <year>2024</year>
          )
          <fpage>158</fpage>
          -
          <lpage>170</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3790</volume>
          /paper14.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Szekely</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Holloway</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Bandukda, Understanding the Psychosocial Impact of Assistive Technologies for Blind and Partially Sighted People: Protocol for a Scoping Review (Preprint)</article-title>
          ,
          <source>JMIR Res. Protoc</source>
          . (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .2196/65056.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <article-title>The Role of Rehabilitation Medicine in the Psychological Etiology of Blindness or Visual Impairment: A Critical Synthesis</article-title>
          , in: Disease and Health Research: New Insights Vol.
          <volume>6</volume>
          ,
          <string-name>
            <given-names>BP</given-names>
            <surname>International</surname>
          </string-name>
          ,
          <year>2024</year>
          , с.
          <fpage>62</fpage>
          -
          <lpage>117</lpage>
          . doi:
          <volume>10</volume>
          .9734/bpi/dhrni/v6/859.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>UNDP</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ukraine</surname>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://www.undp.org/uk/ukraine.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>N.</given-names>
            <surname>Radwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Farouk</surname>
          </string-name>
          ,
          <article-title>The Growth of Internet of Things (IoT) In The Management of Healthcare Issues and Healthcare Policy Development</article-title>
          ,
          <string-name>
            <given-names>Int. J. Technol.</given-names>
            ,
            <surname>Innov</surname>
          </string-name>
          . Manag.
          <source>(IJTIM) 1</source>
          .1 (
          <year>2021</year>
          )
          <fpage>69</fpage>
          -
          <lpage>84</lpage>
          . doi:
          <volume>10</volume>
          .54489/ijtim.v1i1.8.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharifi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Bibri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Khavarian-Garmsir</surname>
          </string-name>
          ,
          <article-title>Smart cities and sustainable development goals (SDGs): A systematic literature review of co-benefits and trade-offs</article-title>
          ,
          <source>Cities</source>
          <volume>146</volume>
          (
          <year>2024</year>
          )
          <article-title>104659</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.cities.
          <year>2023</year>
          .
          <volume>104659</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Desul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A. G.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H. M.</given-names>
            <surname>Kamal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Kalumba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswal</surname>
          </string-name>
          , R. M. da
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>C. A. C.</given-names>
          </string-name>
          dos
          <string-name>
            <surname>Santos</surname>
          </string-name>
          ,
          <article-title>A bibliometric analysis of sustainable development goals (SDGs): a review of progress, challenges, and opportunities</article-title>
          , Environ.,
          <string-name>
            <surname>Dev</surname>
          </string-name>
          . Sustain. (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1007/s10668-023-03225-w.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghafoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aslam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wahla</surname>
          </string-name>
          ,
          <article-title>Improving social interaction of the visually impaired individuals through conversational assistive technology</article-title>
          ,
          <source>Int. J. Intell. Comput. Cybern</source>
          . (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1108/ijicc-06-2023-0147.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaple</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Raut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Patni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Banode</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ninawe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shelke</surname>
          </string-name>
          ,
          <article-title>Artificial Intelligence on Visually Impaired People: A Comprehensive Review</article-title>
          ,
          <source>in: 2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV)</source>
          , IEEE,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1109/icicv62344.
          <year>2024</year>
          .
          <volume>00052</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>I.</given-names>
            <surname>Jeong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jung</surname>
          </string-name>
          , J. Cho, YOLOv8
          <article-title>-Based XR Smart Glasses Mobility Assistive System for Aiding Outdoor Walking of Visually Impaired Individuals in South Korea</article-title>
          ,
          <source>Electronics 14.3</source>
          (
          <year>2025</year>
          )
          <article-title>425</article-title>
          . doi:
          <volume>10</volume>
          .3390/electronics14030425.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Kamran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Orakzai</surname>
          </string-name>
          , U. Noor,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Afridi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sher</surname>
          </string-name>
          , Visually: Assisting the
          <string-name>
            <given-names>Visually</given-names>
            <surname>Impaired People Through AI-Assisted</surname>
          </string-name>
          <string-name>
            <surname>Mobility</surname>
          </string-name>
          , (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>G. I.</given-names>
            <surname>Okolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Althobaiti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ramzan</surname>
          </string-name>
          ,
          <article-title>Smart Assistive Navigation System for Visually Impaired People</article-title>
          ,
          <source>J. Disabil. Res. 4.1</source>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .57197/jdr-2024-0086.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hemavathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Sabarika</given-names>
            <surname>Shree</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Priyanka</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Subhashree, AI Based Voice Assisted Object Recognition for Visually Impaired Society</article-title>
          , In: 2023
          <source>International Conference on Data Science, Agents &amp; Artificial Intelligence (ICDSAAI)</source>
          , IEEE,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1109/icdsaai59313.
          <year>2023</year>
          .
          <volume>10452456</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sameer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Madan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kannan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. J.</given-names>
            <surname>Upadhye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Patil</surname>
          </string-name>
          , S. Rajkumar,
          <article-title>AI-based Object Detection for Assisting the Visually Impaired People</article-title>
          ,
          <source>in: 2024 5th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI)</source>
          , IEEE,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1109/icmcsi61536.
          <year>2024</year>
          .
          <volume>00080</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Hagargund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. V.</given-names>
            <surname>Thota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Shaik</surname>
          </string-name>
          ,
          <article-title>Image to speech conversion for visually impaired</article-title>
          ,
          <source>Int. J. Latest Res. Eng. Technol</source>
          .
          <volume>3</volume>
          .
          <issue>06</issue>
          (
          <year>2017</year>
          )
          <fpage>09</fpage>
          -
          <lpage>15</lpage>
          . URL: https://www.academia.edu/download/82052515/2_B2017160.pdf
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>D. Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Roy</surname>
          </string-name>
          ,
          <article-title>Object Detection with voice output for visually impaired</article-title>
          , in: 2024 International Conference on Communication,
          <article-title>Computing and Internet of Things (IC3IoT)</article-title>
          , IEEE,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1109/ic3iot60841.
          <year>2024</year>
          .
          <volume>10550247</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Haryanto</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Kholis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hadi</surname>
            , E. Supriyadi,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ikhsani</surname>
          </string-name>
          ,
          <article-title>The development of face recognition application system (SAFR) as an adaptive teaching support</article-title>
          ,
          <source>in: The 6th International Conference of Ice-Elinvo</source>
          <year>2023</year>
          :
          <article-title>Digital Solutions for Sustainable and Green Development</article-title>
          , AIP Publishing,
          <year>2025</year>
          ,
          <volume>020007</volume>
          . doi:
          <volume>10</volume>
          .1063/5.0261215.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Fang,
          <article-title>LittleFaceNet: A Small-Sized Face Recognition Method Based on RetinaFace and AdaFace</article-title>
          ,
          <source>J. Imaging 11.1</source>
          (
          <year>2025</year>
          )
          <article-title>24</article-title>
          . doi:
          <volume>10</volume>
          .3390/jimaging11010024.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>N. Sabah</given-names>
            <surname>Abbod</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Mohasefi</surname>
          </string-name>
          ,
          <article-title>Designing Face Detection Systems with Gray Wolf Optimization</article-title>
          ,
          <source>Iraqi J. Electr. Electron. Eng. 21.2</source>
          (
          <year>2025</year>
          )
          <fpage>64</fpage>
          -
          <lpage>75</lpage>
          . doi:
          <volume>10</volume>
          .37917/ijeee.21.
          <issue>2</issue>
          .7.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ranjan</surname>
          </string-name>
          <string-name>
            <surname>Mishra</surname>
          </string-name>
          ,
          <article-title>Face Recognition Using LBPH and CNN, Recent Adv</article-title>
          .
          <source>Comput. Sci. Commun</source>
          .
          <volume>17</volume>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .2174/0126662558282684240213062932.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vijayakumar</surname>
          </string-name>
          , S. Vairavasundaram,
          <article-title>YOLO-based Object Detection Models: A Review and its Applications, Multimedia Tools Appl</article-title>
          . (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1007/s11042-024-18872-y.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>OpenCV</surname>
          </string-name>
          , Cascade Classifier Tutorial,
          <year>2025</year>
          . URL: https://docs.opencv.
          <source>org/4</source>
          .x/db/d28/tutorial_cascade_classifier.html.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37] TensorFlow, TensorFlow Official Website,
          <year>2025</year>
          . URL: https://www.tensorflow.org/.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Pandas</surname>
          </string-name>
          , Pandas Official Website,
          <year>2025</year>
          . URL: https://pandas.pydata.org/.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <fpage>PyQt5</fpage>
          , PyQt5 Official Website,
          <year>2025</year>
          . URL: https://pypi.org/project/PyQt5/.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40] NumPy, NumPy Official Website,
          <year>2025</year>
          . URL: https://numpy.org/.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <surname>Ultralytics</surname>
          </string-name>
          , Ultralytics Official Website,
          <year>2025</year>
          . URL: https://ultralytics.com/.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>COCO</given-names>
            <surname>Dataset</surname>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://cocodataset.org/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>