<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Wisdom Political
Science and Multidisciplinary Sciences</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.48550/arXiv.2407.12687</article-id>
      <title-group>
        <article-title>Avoiding Type I Errors in Image Processing with SIFT/BRISK-keypoints on Android Smartphones</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmytro Zubov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Kupin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kryvyi Rih National University</institution>
          ,
          <addr-line>11 Vitaly Matusevich St., Kryvyi Rih, 50027</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Central Asia</institution>
          ,
          <addr-line>125/1 Toktogul St., Bishkek, 720001, Kyrgyz Republic</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>3688</volume>
      <fpage>24</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Avoiding false-positive recognition of objects is a topical problem for specific areas, such as detecting traffic signs for visually impaired pedestrians, fire emergency signs inside buildings, and construction safety signs. Existing solutions show that the percentage of incorrectly recognized traffic signs can reach 25 % for smart vehicles. In this study, SIFT/BRISK-keypoints are employed to design the image descriptor. An experiment with ten images of crosswalk traffic signs and 90 other images (including different traffic signs) showed that the false positive rate is zero and the false negative rate equals 50 %. The implementation is based on the Java Android application with the possibility to correct the knowledge base in case of false alarms. Image analysis was performed on smartphones Doogee S96 Pro and Samsung M31 with an execution time of less than one second. The most likely prospect for further development of this study is the design of the set of image descriptors to improve the false negative rate avoiding type I errors at the same time.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;image processing</kwd>
        <kwd>type I error</kwd>
        <kwd>SIFT/BRISK-keypoint</kwd>
        <kwd>Android smartphones 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>A zero false positive rate, i.e., type I error [1, 2], and minimization of false negative rate, i.e., type II
error [1, 2], is the complex criterion employed in ad-hoc image processing projects, such as detecting
traffic signs for visually impaired pedestrians, fire emergency signs inside buildings, and
construction safety signs. Existing solutions show that the percentage of incorrectly recognized
traffic signs can reach 25 % for smart vehicles [3]. Up-to-date real-life implementations also demand
autonomous and low-energy solutions since Internet connections are often unstable, and the average
ChatGPT request consumes about 0.34 watt-hours and about 0.32176 ml of water [4]. The ecological
impact depends on the neural network models (NNMs) complex responses produce more CO2
emissions than simple responses, and NNMs that provide more accurate responses result in higher
emissions [5]. Reasoning- compared to
concise response models [5]. Aliya Rysbek, the research software engineer at Google DeepMind UK
[6], pointed out at the KIT forum in Bishkek (Kyrgyz Republic) on 29th May 2025 that her team could
recently save about 1 % of the energy consumed by some NNMs which is a huge step considering a
tremendous number of requests processed by Google datacenters worldwide.</p>
      <p>In this study, the autonomous and low-energy software was developed using Java Android mobile
application and SIFT/BRISK-keypoints [7] (Scale-Invariant Feature Transform and Binary Robust
Invariant Scalable Keypoints), that is the implementation of the edge computing principle [8]. Power
efficacy is achieved by executing the performance-optimized code on the continuously running
smartphone without transmitting the data wirelessly. The presented approach employs a unique
image descriptor for every target object which is different from the previously developed method
[7], where a 291-point pattern is applied. Initially, a multithreaded Java Android application takes a
photo via CameraX library [9], and then the method  . 
generates a new bitmap scaled to a maximum resolution of 500 pixels using bilinear filtering. From
up to 700 keypoints detected by the SIFT method,
keypoints are selected. Next, the BRISK binary descriptor is designed considering keypoints that are
unique on the target image, and the distances to basic keypoints are calculated. Experiments
conducted on the Doogee S96 Pro and Samsung M31 smartphones demonstrated that the execution
time is less than one second, with a false positive rate of zero and a false negative rate of 50 %.</p>
      <p>The remaining part of the paper proceeds as follows: Section 2 reviews relevant works in the
context of the most cutting-edge computer vision techniques. It also introduces the proposed
soft/hardware architecture. Section 3 outlines the problem setup and the experiment setup from image
capturing to image matching. Section 4 presents a successful experiment conducted with 100 images.
Results and discussion are presented in Section 5 and Section 6, respectively. Conclusions are
summarized in Section 7.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>Up-to-date image processing algorithms emphasize accuracy, interpretability, transparency, speed,
and scalability while reducing computational costs [10]. Some of the most cutting-edge computer
vision techniques are as follows:
1. Convolutional neural networks (CNNs) [11, 12].
2. Vision transformers (ViTs) [13].
3. Segmentation techniques [14].
4. In addition, there are numerous other image processing techniques, such as generative
adversarial networks, super-resolution algorithms, adaptive histogram equalization, and
denoising algorithms [10].</p>
      <p>Two-dimensional CNNs, such as those presented in [11, 12], are prevalent in image processing
nowadays. They use convolution to detect patterns in images, and then classify them, detect objects,
apply semantic segmentation, etc. Prior to CNN processing, images have preprocessing steps such
as homogenization, normalization, and principal component analysis [12]. Basic CNN components
are the convolution layer, pooling layer, activation function, batch normalization, dropout, and fully
connected layer. The most common CNN models are AlexNet, ResNet, VGG, GoogleNet, Xception,
Inception, DenseNet, and EfficientNet [13].</p>
      <p>In contrast to CNNs, which depend on hierarchical feature extraction, ViTs analyze images as
sequences of smaller patches, enabling them to capture contextual information and long-range
dependencies, resulting in improved image recognition capacities [13].</p>
      <p>Image segmentation divides an image into distinct regions based on certain characteristics
[14, 15]. Techniques like U-Net architectures, Canny edge detection, and Mask R-CNNs provide
efficient and precise solutions [10]. In this study, each image is segmented into two regions the
object O and the background B [15]. Following the application of a segmentation algorithm, pixels
or other image attributes are classified into either region O or B.</p>
      <p>The above-stated image processing techniques face common challenges in real-life
implementation [10]:</p>
      <p>High computational cost: values can differ several times. Solution: computational complexity
reduction (the code was performance-optimized in this study).</p>
      <p>Overfitting diminishes the model's generalization ability, potentially leading to lower
accuracy. Solution: data augmentation and regularization (thresholding is used to separate
the object and background regions in this study).
3. Noise and distortion can compromise the accuracy of image processing algorithms. Solution:
images should be filtered (bilinear and Gaussian filters are employed in this study).
4. Interpretability and transparency of some AI-driven image processing methods. Solution:
non-AI driven image processing methods (image processing with SIFT/BRISK-keypoints is
employed in this study).
5. Real-time processing constraints. Solution: high-performance soft-/hardware (multicore
smartphones were utilized in this study).
6. Ethical and privacy concerns. Solution: autonomous systems (edge computing with a mobile</p>
      <p>Java Android application was implemented in this study).</p>
      <p>The growing demand for machine learning on mobile devices has led to the development of
lightweight CNN models, such as MobileNet [11, 16], which are optimized for use with limited
computational power and memory. MobileNet V1 employed depthwise separable convolutions, and
MobileNet V2 improved upon this with inverted residual blocks, further enhancing efficiency. Later
versions of MobileNet optimize performance for mobile CPUs. Data requirements are the key
drawback of lightweight CNN models since they require a significant amount of data to be trained
to achieve acceptable performance on mobile devices. Some projects, as presented in this study, lack
large datasets. This limitation leads to a loss of accuracy and challenges in training and optimization.</p>
      <p>In this study, the system requires a false positive rate of zero and a minimized false negative rate.
Considering the challenges outlined and the latest image processing techniques, the proposed
architecture of the project soft-/hardware is presented in Figure 1. In this prototype, the end-user
interacts with the smartphone via the simple user interface based on the button element with an
onClick listener [17]. The multithreaded Java Android application processes images captured by the
smartphone camera using the CameraX library and SIFT/BRISK-keypoints [7]. To update the
knowledge base, the mobile application should have the option to download the updated information
from Internet resources, such as GitHub and Firebase. For this purpose, the study proposes the use
of JSON data format [17] because of its lightweight nature and widespread use in mobile applications.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>3.1.</p>
      <sec id="sec-3-1">
        <title>Problem setup</title>
        <p>The following notations are used in the following sections of the paper. Considering a grayscale
image I(x, y) with n SIFT/BRISK-keypoints [7] with coordinates, i.e., pairs of real numbers (xn,yn), the</p>
        <sec id="sec-3-1-1">
          <title>End-user</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>User interface</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>Photo with or without the target object</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>Knowledge base (GitHub, Google Firebase)</title>
        </sec>
        <sec id="sec-3-1-5">
          <title>JSON</title>
        </sec>
        <sec id="sec-3-1-6">
          <title>Java Android application: Image processing Figure 1: The proposed architecture of soft-/hardware complex. Environment</title>
          <p>
            In this study, algorithm A employs a score  that quantifies the difference between the object and
background regions. This score is determined based on various conditions cj (jm, where m=m1+m2
is the number of conditions in algorithm A; m1 is the number of obligatory conditions; m2 is the
number of optional conditions), equations (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) and (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ), and pixel values of a grayscale image I(x, y). If
1{cj} represents the indicator function, which returns 1 if cj is true and 0 otherwise, the characteristic
mob of the object region is calculated as follows:
where m1 conditions in the product operation are obligatory, m2 conditions in the summation
operation are optional.
          </p>
          <p>Similar to mob (5), the characteristic mbg of the background region is calculated with given
conditions cj as follows:</p>
          <p>
            2: = {( ,  )| ,  ∈  }.
{ ,  } =  ( ( ,  )).
 
 
 −1
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            )
(4)
(5)
(6)
(7)
          </p>
        </sec>
        <sec id="sec-3-1-7">
          <title>Then, a score  is computed using a threshold value V:</title>
          <p>The threshold value V is determined from the training data. Thus, algorithm A indicates the
presence (=1) or absence (=0) of the target object in grayscale image I(x, y).
3.2.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Experiment setup</title>
        <p>In this study, a group named Crosswalk of traffic signs Crosswalk right , Crosswalk left , and
Zebra crossing (see Figure 2) is identified in the image using the standard representation officially
accepted in the Kyrgyz Republic [18]. The "Crosswalk" group is a subset of the traffic signs intended
for pedestrians. Avoiding type I errors in this object recognition is a crucial point in the spatial
cognition of visually impaired people [7].</p>
        <p>The core steps of the above-stated algorithm A are as follows:</p>
        <p>Downsampling the image with bilinear filtering in a Java Android application.
3. Localization of SIFT/BRISK-keypoints.
4. Selection of SIFT/BRISK-keypoints that have stable positions across different SIFT octaves.
5. Designing the image descriptor based on selected SIFT/BRISK-keypoints and algorithm A (7).
6. Image matching.</p>
        <p>A)</p>
        <p>B)</p>
        <p>C)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <sec id="sec-4-1">
        <title>Crosswalk left , Crosswalk right , and</title>
        <p>4.1.</p>
        <sec id="sec-4-1-1">
          <title>Selection of SIFT/BRISK-keypoints with stable positions across different</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>SIFT octaves</title>
          <p>
            The image capturing by the smartphone s camera and CameraX Android API, downsampling the
image with bilinear filtering in Java Android application, and localization of SIFT/BRISK-keypoints
are the algorithm steps, which are similar to those described in [7]. An example of the selection of
SIFT/BRISK-keypoints on traffic sign Crosswalk left with stable position across different SIFT
octaves is shown in Figure 3, where the first octave of size 340340 pixels contains 49
SIFT/BRISKkeypoints, the second octave 680680 pixels, 109 SIFT/BRISK-keypoints, and the third octave
13601360 pixels, 124 SIFT/BRISK-keypoints (fuchsia and turquoise colors are used for
third/second/first and fourth/third/second DoG (Difference of Gaussians) functions, respectively).
The values of the population standard deviations in the Gaussian blur operator are consistent with
those presented in [7]. Only 700 keypoints that are closest to the center of the image, based on
Euclidean distance (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ), are considered.
          </p>
          <p>Analysis of three SIFT octaves shows that 47 SIFT/BRISK-keypoints have stable positions on these
SIFT octaves (see Figure 4). Some coordinates of SIFT/BRISK-keypoints are as follows (octave of size
13601360 pixels is used; the origin of coordinates is at top left; the axis x is horizontal; the axis y is
vertical): (x0, y0)=(283, 1064), (x1, y1)=(353, 961), (x19, y19)=(116, 1086), (x31, y31)=(679, 111),
(x46, y46)=(1243, 1086).</p>
          <p>A)</p>
          <p>B)</p>
          <p>C)
(B), octave 13601360 pixels (C).
6,25
4,24
34,36</p>
          <p>37
13,38
27</p>
          <p>40
32
5,26</p>
          <p>14,41
12,39
19</p>
          <p>15,42
1,21
3,23
9,30
10,33</p>
          <p>17,44
0,20
2,22
7,28</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>Designing the image descriptor based on the selected SIFT/BRISKkeypoints and algorithm A (7)</title>
          <p>In this study, the following descriptive visual attributes [19] are employed to design indicator
functions 1{cj} using a few human-friendly text descriptions:
1. SIFT/BRISK-keypoints 19, 31, and 46 are the basic components that form a triangle with other</p>
          <p>
            SIFT/BRISK-keypoints located inside.
2. The grayscale image has the average pixel value denoted as Iav.
3. Euclidean distances (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) between SIFT/BRISK-keypoints.
4. Lines (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) connect various SIFT/BRISK-keypoints.
          </p>
          <p>
            The obligatory conditions cj (j&lt;m1; m1=3 in this study) were formulated by the human expert as
follows:
1. c0: the distances between SIFT/BRISK-keypoints 19-31, 19-46, and 31-46 should be greater
than 100 pixels and equal to one another, with a relative standard deviation (RSD) of 0.05.
2. c1: the pixel values at SIFT/BRISK-keypoints 19, 31, and 46 should be greater than (Iav-20),
indicating that the intensity must be light.
3. c2: the pixel values on the lines (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) connecting SIFT/BRISK-keypoints 19-31, 19-46, and 31-46
should be greater than (Iav-20). 5 % is the mistake allowed in c2.
          </p>
          <p>The optional conditions cj (m1j&lt;(m1+m2); m2=44 in this study) were formulated by a human
expert as follows:
6. c8: the distances between SIFT/BRISK-keypoints 5-19, 5-31, and 5-46 should match the
calculated distances on the template image (see Figure 3) with RSD=0.05. Additionally, the
pixel value at the SIFT/BRISK-keypoint 5 should correspond to the relevant
SIFT/BRISKkeypoint on the template image (it must be greater than (Iav-20) in this study).
44. c46: the distances between SIFT/BRISK-keypoints 45-19, 45-31, and 45-46 should match the
calculated distances on the template image (see Figure 3) with RSD=0.05. Additionally, the
pixel value at the SIFT/BRISK-keypoint 45 should correspond to the relevant
SIFT/BRISKkeypoint on the template image (it must be less than (Iav-30) in this study).</p>
          <p>The conditions c0-c46 were designed to specifically target the features of crosswalk signs (see
Figure 2).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>In this study, a Java Android mobile application implements the proposed image processing method
to detect a specific group of Kyrgyz traffic signs . An experiment with ten images of
crosswalk traffic signs and 90 other images (including different traffic signs) showed a false positive
rate of zero and a false negative rate of 50 %. Image analysis was performed on smartphones Doogee
S96 Pro and Samsung M31 with an execution time of less than one second. All original color
pictures/photos and their grayscale versions with keypoints generated by SIFT algorithm were
uploaded to the Google Drive folder
https://drive.google.com/drive/folders/15Dk27s8_2mIZnBsLNcq8j11WJbeFAFVP
taken by co-author Dr. Dmytro Zubov. Figure 5 shows nine examples of pictures and photos with
average pixel values used in the experiment. Figure 6 presents the grayscale images with
SIFT/BRISK-keypoints for the initial data presented in Figure 5.</p>
      <p>The analysis of the experimental results shows that the target image can be identified using two
distinct rules:</p>
      <p>Rule 1 (for pictures of traffic signs [18]): A threshold value of V equals 20 (the closest value
of mob equals 36 for the traffic sign mbg equals 14 for
uploaded in subfolder 12 on Google Drive).</p>
      <p>Rule 2 (for photos taken by the smartphone): A threshold value of V equals 9 (the closest
value of mob equals 9 for the traffic sign presented in Figure 5(G); the closest
value of mbg equals 6 for the photo uploaded in subfolder 91 on Google Drive).</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>During the discussion, the project was presented at the Department of Computer Science of the
University of Central Asia and the Department of Computer Systems and Networks of the Kryvyi
Rih National University, two questions were raised:</p>
      <p>The segmentation technique was employed to process images. However, it was noted that
CNNs are widely used nowadays. An explanation was provided regarding the interpretability
and transparency of the applied approach, especially in safety-critical projects where
understanding the decision-making process is essential.</p>
      <p>The smartphone application was developed using Java programming language. Although this
is a common choice for Android mobile applications, it was suggested to consider using the
Kotlin programming language and the Jetpack Compose declarative paradigm to enhance
code readability for other developers.</p>
      <p>Additionally, the Gemini generative artificial intelligence chatbot [20] was requested to provide
recommendations on what should be improved in the research paper (see Figure 7) on June 28, 2025.
The response included several suggestions for improvement, categorized by different sections of the
manuscript. Three general recommendations are as follows:
1. Consistency in Terminology: While you use "SIFT/BRISK-keypoints" throughout, sometimes
"SIFT keypoints" are mentioned in isolation (e.g., in the experiment section, "Localization of
SIFT/BRISK-keypoints" is followed by "Selection of SIFT/BRISK-keypoints"). Ensure
consistency or clarify if there's a specific reason for the distinction at certain points.
2. Flow and Transitions: Some transitions between paragraphs and sections could be smoother.</p>
      <p>Consider using more explicit linking phrases or sentences to guide the reader.
3. Conciseness: While the paper is generally well-structured, some sentences or phrases could
be more concise. For example, instead of "The most likely prospect for further development
of this study is the design of the set of image descriptors to improve the false negative rate
avoiding type I errors at the same time", you could simplify it to "Future work will focus on
designing a set of image descriptors to improve the false negative rate while maintaining
zero Type I errors."</p>
      <p>Some of the above-stated recommendations, such as paraphrasing sentences in conclusions, have
already been considered. Other suggestions are discussed in references or are not critical in the
presented study, and hence they can be taken into account in future work.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>This study presents a new method of image processing with SIFT/BRISK-keypoints and descriptive
visual attributes, implemented in the developed prototype of Java Android mobile application. The
interpretability, transparency, and zero false positive rate of the applied approach are the key
advantages.</p>
      <p>The core steps of the image processing algorithm are as follows:
1.
2.
3.
4.
5.</p>
      <p>Downsampling the image with bilinear filtering.</p>
      <p>Localization of SIFT/BRISK-keypoints.</p>
      <p>Selection of SIFT/BRISK-keypoints with stable positions across different SIFT octaves.
Designing the image descriptor based on the selected SIFT/BRISK-keypoints and descriptive
visual attributes.</p>
      <p>6. Image matching.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This work and the research behind it received support from the universities where the authors
conducted the study. The authors express their sincere gratitude to colleagues at the University of
Central Asia and Kryvyi Rih National University who contributed to this project.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>In preparing this work, the authors employed the Grammarly writing assistant [21] for grammar and
spelling errors, as well as the Gemini generative AI chatbot to discuss the results of the study.
Following the use of these tools, the authors reviewed and edited the content. The authors take full
responsibility for the content of this publication.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shreffler; M. R. Huecker</surname>
          </string-name>
          ,
          <article-title>Type I and Type II Errors</article-title>
          and
          <string-name>
            <given-names>Statistical</given-names>
            <surname>Power</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://www.ncbi.nlm.nih.gov/books/NBK557530/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Lieberman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <article-title>Type I and Type II Error Concerns in fMRI Research: Rebalancing the Scale</article-title>
          .
          <source>Social Cognitive and Affective Neuroscience</source>
          <volume>4</volume>
          .4 (
          <year>2009</year>
          )
          <fpage>423</fpage>
          -
          <lpage>428</lpage>
          . doi:
          <volume>10</volume>
          .1093/scan/nsp052.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D</given-names>
            <surname>.</surname>
          </string-name>
          .
          <article-title>-Ready Traffic Sign Recognition Systems in Cars: A Test Field Study</article-title>
          ,
          <source>Energies</source>
          <volume>14</volume>
          .12 (
          <year>2021</year>
          )
          <article-title>3697</article-title>
          . doi:
          <volume>10</volume>
          .3390/en14123697.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>