<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Model Based on a Vector of Uncorrelated Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andriy Fesenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymir Druzhynin</string-name>
          <email>volodymirdruzhynin68@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nataliia Tsopa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladyslav Synhaivskyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beresteiskyi</institution>
          ,
          <addr-line>Kyiv, 03056</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kyiv Polytechnic Institute”</institution>
          ,
          <addr-line>37, Prospect</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Technical University of Ukraine “Igor Sikorsky</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>60 Volodymyrska Street, Kyiv, 01033</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>20</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>This article explores the task of image recognition based on multiple unrelated features, using the example of identifying circulating coins through convolutional neural networks. The paper outlines the conventional method of image recognition which employs a standard convolutional neural network with a single output, where each image corresponds to a distinct class. The language used is clear, objective, and value-neutral, and technical term abbreviations are explained upon first use. The structure is clear and logical, with causal connections between statements. The text is free from grammatical errors, spelling mistakes, and punctuation errors, and adheres to American English spelling and grammar conventions. An analysis of the results and methodology suggests that the present architecture is not optimal for datasets with a large number of classes. To enhance recognition accuracy, we advocate for a convolutional neural network architecture with multiple outputs. This method entails branching the network structure into several branches at a particular stage. By employing this neural network type, each image corresponds to a list of various independent characteristics, rather than a solitary composite category. This division creates multiple subtasks for recognizing images, each designated to a distinct branch of the neural network. A comparative analysis of the traditional neural network and a network with multiple outputs was conducted in the study. The identified objective of the study was to assess the pros and cons of each approach and impartially scrutinize the rationale for observed result disparities. The results indicate that implementing a less conventional architecture of a convolutional neural network with multiple outputs can surmount the hindrances connected with a sizable quantity of composite classes and the indistinctness of ascertaining active coins by several traits. This suggested technique broadens the opportunities for utilizing neural networks in image recognition assignments having numerous classes and a scant quantity of training data. Сonvolutional neural network, neural network with multiple outputs, image recognition, Proceedings ceur-ws.org</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>machine learning, artificial intelligence.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Today, the convolutional neural network (CNN) is widely considered the most effective method
for image recognition. The aim of this study is to create a CNN that can identify images based on a
range of unrelated characteristics, as exemplified by its successful identification of circulating coins
from various European countries. Additionally, alternative techniques were examined to enhance the
precision of image recognition. To achieve this goal, we carefully scrutinized the traditional
methodology for image recognition and</p>
      <p>made modifications to the neural network model that
accounted for the unique characteristics of the input data.</p>
      <p>Most features of a coin, such as its denomination, currency unit, country, year of issue, and mint
markings, can be determined from the images of its front and back. However, measuring the diameter
and weight of the coin necessitates the use of additional tools. Because capturing photographs of the</p>
      <p>2023 Copyright for this paper by its authors.
CEUR
coin's obverse and reverse is effortless and requires minimal equipment, users can recognize coins
most conveniently by using these photos to identify their critical characteristics.</p>
      <p>To effectively train a CNN, having a large dataset of labeled images that correspond to specific
categories is crucial. Although specialized websites on the internet offer readily available datasets for
most image recognition tasks, this research necessitates the manual labeling of images from freely
available online sources and personal photographs since there is no available premade dataset.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Analysis of Recent Research</title>
      <p>
        When addressing image recognition tasks, it is customary to employ a classic approach - a
standard convolutional neural network initially developed by Yann LeCun in 1995 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Such a
network accepts an image as its input and generates a vector of numbers that demonstrates the neural
network's degree of confidence in associating the image with each designated category. In this
approach, images within the training dataset are assigned to distinct classes based on their distinctive
and defining features. For identification of circulating coins, it is necessary to assign a unique
attribute to each coin image comprising its denomination, currency type, and country. Consequently, a
separate "1 coin, Ukraine" category would be created for each image. For European nations, this
classification scheme would potentially result in hundreds of such categories. However, due to the
dataset's limited number of images, each class would end up with a small number of images.
      </p>
      <p>
        A conventional convolutional neural network includes sequential convolution layers with the
ReLU activation function and utilizes filters, as shown in Fig. 1 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As the image is processed, these
filters generate feature maps for each filter, which are then sub-sampled by pooling layers to decrease
their spatial dimensions [3]. Image classification is performed at the end of this network utilizing one
or more fully connected neural layers.
      </p>
      <p>For numerous image recognition tasks, a pre-existing neural network architecture is frequently
utilized, with architectures that have won the annual ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) [4-9] often being selected. This challenge assesses algorithms for image
recognition and object detection. The VGG16 [10-14] and ResNet50 [15-19] networks are the most
commonly employed, both trained on the ImageNet dataset. The study utilizes Stevo Bozinovski's
[20] transfer learning technique to transfer knowledge from the extensive ImageNet dataset to a
smaller one. However, due to the specificity of the input data and limited neural network training
resources, this approach does not yield notably improved outcomes. As a result, a tailored neural
network design is imperative.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Main Section</title>
      <p>For this study, a dataset was compiled that included over 40,000 photographs of current coins from
36 countries, including Ukraine, Poland, the United Kingdom, and other European Union nations. The
dataset features two square images (150x150 pixels) of each coin, showcasing the obverse and reverse
sides against either a white or light gray backdrop. All photographs were standardized at 300 pixels
wide and 150 pixels high. The dataset was split into two parts. 80% of the total amount was utilized as
training data for the neural network. The other 20% of the dataset was used as test data to evaluate the
network's performance. Within the training data, 80% was solely reserved for training, while 20% was
employed for validation during each epoch of neural network training. This method enabled swift
overfitting detection. Table 1 displays the distribution of images in the dataset across various
countries.</p>
      <p>The dataset comprised 353 image classes categorized based on the format "country of origin
currency unit - denomination - coin type." The pictures of coins, following a 1:1 aspect ratio,
displaying the obverse or reverse side of the coin, were photographed against a white, light gray, light
blue, or light yellow background. Depending on the technical specifications of the camera used, as
well as the intensity and color of the lighting, the background hue shifted correspondingly. To enlarge
the dataset and ensure precise image recognition even under suboptimal conditions, data
augmentation was employed. The neural network was developed using the Google Colab
environment, which provides ample computational power for creating, training, testing, and deploying
neural networks. Additionally, users are able to work with TensorFlow, the open-source machine
learning library [21], and its high-level API, Keras [22], through Google Colab notebooks. Although
TensorFlow supports multiple programming languages, Python was chosen for its widespread use in
developing neural networks. Given the atypical rectangular format of the input data (300x150 pixels)
and narrow standardization in image presentation (both sides of the coin displayed on a white or light
gray background), and limited resources available, a customized neural network was chosen for
analysis. See Figure 2 for the code outlining this network.</p>
      <p>This neural network consists of three pairs of convolutional layers, each consisting of 64, 128, and
256 filters that are 3x3 in size, which alternate with max pooling layers. The max pooling layers
feature a. Filters of dimensions 2x2 recognize and transfer the highest value to the subsequent layer
within the neural network. The network ends with two totally connected layers equipped with the
ReLU activation function, with a capacity of 512 and 256 neurons, respectively. Additionally, the
neural network includes a single fully connected layer composed of 212 neurons for the purpose of
classification. Figure 3 illustrates the resulting graph of the network. The network's performance
results, obtained after training for 10 epochs, are displayed in Figure 4.</p>
      <p>For further clarity, accuracy and loss function graphs for both training and validation data are also
shown in Figure 5.</p>
      <p>Based on the validation data graphs, it is apparent that increasing the number of epochs leads to
overfitting. This happens when the network memorizes the training data, resulting in improved
recognition accuracy at the cost of maintaining almost the same accuracy on validation data. The
recognition accuracy on test data was 0.9365 (Figure 6).</p>
      <p>Numerous experiments were performed on the neural network in an attempt to boost its
complexity by adding more layers and neurons throughout the project. Regrettably, these
adjustments were determined to have negligible impact on the recognition accuracy of test
data, which persevered within the range of 93-94%. On the other hand, there was a
significant surge in both computational resources demand and training time. To improve
recognition accuracy, the possibilities include enlarging the image dataset through image
augmentation or using an alternative neural network architecture. A probable solution,
considering the specificity of the data, would be to adopt a convolutional neural network with
multiple outputs.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Overview of the neural network with multiple outputs</title>
      <p>When examining why traditional convolutional neural networks present relatively low
accuracy rates, an unusual aspect of the data comes to light. The current definition of traits of
circulating coins, specifically "denomination, currency unit, country," only allows for limited
image categorization, resulting in a substantial number of classes (212, specifically). Within
each category, there is a paucity of images, and having a restricted number of images across
numerous categories leads to a decline in training precision.</p>
      <p>After further examination, it is clear that each coin classification comprises three
distinct characteristics: denomination, currency unit, and country of origin. None of these
traits alone can provide enough information for definitive identification of a specific coin.
Focusing exclusively on denomination, for example, can result in significantly fewer
classifications for the same number of training images. Expanding the dataset to include more
images within each category enhances recognition accuracy for that particular feature. The
accuracy of identifying coins based solely on their currency or country of origin is similarly
increased. Additionally, augmenting the dataset by incorporating images of modern
circulating coins from another country to the combined feature may require adding
approximately ten new classes. However, the list of country classes would only require one
new class, while zero to two new classes may be added for currency unit and denomination
classes. When working with a dataset that includes many composite classes, using a
convolutional neural network with multiple outputs [23] can prove to be beneficial. This
architecture receives an input image and generates several correspondence vectors, each
corresponding to a different classification. To achieve this, the neural network branches at a
specific point, resulting in the creation of multiple branches. This branching can occur at any
stage of the neural network (refer to Figure 7).</p>
      <p>This network architecture is commonly used to collect data from different formats that
demand distinct processing methods, incorporating outcomes from classification and
regression analysis [24]. Moreover, it can handle images containing multiple unrelated
characteristics. When utilizing a convolutional neural network with multiple outputs for coin
recognition, a total of three correspondence vectors can be acquired, with each vector
corresponding to a specific denomination, currency unit, and country. Through the
implementation of this approach, the recognition accuracy for each feature is significantly
improved in comparison to a regular network featuring a solitary output that utilizes
compound characteristics as classes, according to source [12].</p>
      <p>Another advantage of this approach is the ability to fine-tune the neural network with
greater accuracy and conserve resources during training by incorporating pre-trained models
as branch layers after branching. This is because disparate feature models may require
different amounts of training epochs. In this study, we implemented a neural network that
consists of three branches at the beginning, each of which contains a single convolutional
model layer. The layer order does not affect the overall neural network, but it enables
individual branch training and weight usage in the general network. Figure 8 showcases the
code for the general neural network, while Figure 9 depicts its graphical representation.</p>
      <p>The value_model, currency_model, and country_model function as grouped layers and
identify the denomination, currency type, and country, respectively, using a structure similar
to that of a conventional convolutional neural network model (refer to Figure 2). Technical
term abbreviations are explained upon first use. The language is clear, concise, and objective,
avoiding biased or figurative language and maintaining formal register. The text adheres to
common academic structure and citation styles, and is free of grammatical and spelling
errors. However, they differ by having fewer neurons in the final classification layer: eight
for the denomination and currency models and twenty-six for the country model.</p>
      <p>The models were trained separately, with consistent distribution of data into training and test sets.
However, each model had a varying distribution of data for training and validation. Figures 10, 12,
and 14 illustrate accuracy graphs and loss functions for training and validation data of denomination,
currency, and country recognition models, respectively. The denominations were recognized with an
accuracy of 0.9941, the currency unit with an accuracy of 0.9989, and the country with an accuracy of
0.9641, as shown in Figures 11, 13, and 15. The graphs show when each model begins to overfit,
allowing determination of the necessary number of training epochs for individual models. Analyzing
the test data reveals high accuracy in denomination and currency recognition, roughly 99%, while
country recognition accuracy is lower, approximately 96-97%. This lower precision may be attributed
to the larger number of input data categories and characteristics. Specifically, the presence of 2 euro
commemorative coins in the dataset complicates identifying their country of origin within the
European Union. Separating denomination, currency, and country recognition leads to higher
accuracy than traditional networks. In addition, accurately determining denomination and currency is
highly likely. The final accuracy, calculated by multiplying branch results, is 0.9574, surpassing
traditional convolutional neural network accuracy by 0.0209. This improvement shows promise.
Improving flexibility in configuring and training individual models within a three-output
convolutional neural network can lead to enhanced results.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>The recognition of coins through images presents a challenging task due to its reliance on
distinguishing unrelated characteristics. However, this allows for the implementation of various
approaches in classifying images and constructing convolutional neural networks. Our study proposes
a traditional approach where each image belongs to a single composite class, identifying its
denomination, currency, and country. The neural network yields one output. Additionally, a method
was presented that utilizes a neural network with various outputs. Each image feature is classified
separately, and the neural network has three outputs, each corresponding to one of them.</p>
      <p>Due to the large number of small-volume image classes in the conventional approach, the
recognition result was suboptimal at 93-94%. Nevertheless, branching the neural network into three
separate branches for each characteristic resulted in significant improvement, with a success rate of
99% for denomination and currency, and 95-96% for the country. A noteworthy advantage of this
approach is the option to flexibly adjust and train separate models for each characteristic and
incorporate them as layers of separate branches of the neural network. One drawback of this approach
is the significant increase in training time and resource use due to the multiple-fold increase in the
number of neural network parameters. Nonetheless, training the embedded models separately can
decrease the amount of resources utilized concurrently during network training.</p>
      <p>Thus, the proposed approach to recognizing circulating coins differs from classical methods using
convolutional neural networks. Not only is a narrow classification into one complex class considered,
but also the separation of each class into three separate characteristics. This approach solves the
problem of complex classes and expands the possibilities of using limited training data, providing
more accurate recognition. A branched architecture of a convolutional neural network with multiple
outputs is proposed, which takes into account each characteristic separately, corrects the limitations of
classical models, and allows to work effectively with small and complex datasets.</p>
    </sec>
    <sec id="sec-7">
      <title>6. References</title>
      <p>[3] Yampolskyi Leonid Stefanovych. Neurotechnologies and neurocomputer systems: a textbook / L.S.</p>
      <p>Yampolsky, O.I. Lisovychenko, V.V. Oliynyk - K.: Dorado-Druk, 2016. 576 p.
[4] ImageNet [Electronic resource] URL: https://image-net.org/challenges/LSVRC/index.php.
[5] Long-tailed visual recognition with deep models: A methodological survey and evaluation. / Yu Fu,
Liuyu Xiang, Guiguang Ding and other // Neurocomputing Volume 509, 14 October 2022, Pages
290-309. https://doi.org/10.1016/j.neucom.2022.08.031
[6] Application of a convolutional neural network with multiple outputs for recognizing circulating coins /
E.Y. Vaivala, N.V. Tsiopa, V.S. Shmidke // Collection of scientific papers of the Military Institute of
Taras Shevchenko National University of Kyiv - K. : Military Institute of Taras Shevchenko National
University of Kyiv, 2021 - № 71. - P. 49-58.
[7] lga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang,
Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet
Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV),
115(3):211–252, 2015.
[8] One evolutionary algorithm deceives humans and ten convolutional neural networks trained on
ImageNet at image recognition / Ali Osman Topal, Raluca Chitic, Franck Leprévost // Applied Soft
Computing, Volume 143, August 2023, 110397. https://doi.org/10.1016/j.asoc.2023.110397
[9] Knowledge driven weights estimation for large-scale few-shot image recognition / Jingjing Chen,
Linhai Zhuo, Zhipeng Wei and other // Pattern Recognition, Volume 142, October 2023, 109668.
https://doi.org/10.1016/j.epsr.2023.109241
[10]Simonyan K. Very Deep Convolutional Networks for Large-Scale Image Recognition / K. Simonyan,</p>
      <p>A. Zisserman., 2015. – 14 p. https://arxiv.org/abs/1409.1556
[11] High accuracy keyway angle identification using VGG16-based learning method / Soma Sarker, Sree
Nirmillo Biswash Tushar, Heping Chen // Journal of Manufacturing Processes, Volume 98, 28 July
2023, Pages 223-233 https://doi.org/10.1016/j.jmapro.2023.04.019
[12] MANet: A two-stage deep learning method for classification of COVID-19 from Chest X-ray images
/ Yujia Xu, Hak-Keung Lam, Guangyu Jia // Neurocomputing, Volume 443, 5 July 2021, Pages
96105 https://doi.org/10.1016/j.neucom.2021.03.034
[13] Multi-level residual network VGGNet for fish species classification / Eko Prasetyo, Nanik Suciati,
Chastine Fatichah // Journal of King Saud University - Computer and Information Sciences, Volume
34, Issue 8, Part A, September 2022, Pages 5286-5295. https://doi.org/10.1016/j.jksuci.2021.05.015
[14] Theckedath, D., Sedamkar, R.R. / Detecting Affect States Using VGG16, ResNet50 and
SE</p>
      <p>ResNet50 Networks. // SN COMPUT. SCI. 1, 79 (2020). https://doi.org/10.1007/s42979-020-0114-9
[15] Deep Residual Learning for Image Recognition / K.He, X. Zhang, S. Ren, J. Sun., 2015. – 12 p.
[16] Wen, L., Li, X. &amp; Gao, L. A transfer convolutional neural network for fault diagnosis based on
ResNet-50. Neural Comput &amp; Applic 32, 6111–6124 (2020).
https://doi.org/10.1007/s00521-01904097-w
[17] Deep learning model for defect analysis in industry using casting images / Rupesh Gupta, Vatsala
Anand, Sheifali Gupta, Deepika Koundal // Expert Systems with Applications Volume 232, 1
December 2023, 120758. https://doi.org/10.1016/j.eswa.2023.120758
[18] Mascarenhas, Sheldon and Mukul l Agarwal. “A comparison between VGG16, VGG19 and
ResNet50 architecture frameworks for Image Classification.” 2021 International Conference on
Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON) 1 (2021):
96-99.
[19] Siva Satya Sreedhar, P. and N. Nandhagopal. “Classification Similarity Network Model for Image
Fusion Using Resnet50 and GoogLeNet.” Intelligent Automation &amp; Soft Computing, vol.31 (2022),
no.3, Pages 1331-1344.
[20] Bozinovski S. Reminder of the First Paper on Transfer Learning in Neural Networks / Stevo</p>
      <p>Bozinovski., 2020. – 12 p.
[21] Module: tf | TensorFlow v2.14.0 URL: https://www.tensorflow.org/api_docs/python/tf?hl=en.
[22] Keras API reference URL: https://keras.io/api/.
[23] D. Xu, Y. Shi, I. W. Tsang, Y. -S. Ong, C. Gong and X. Shen, "Survey on Multi-Output Learning," in
IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 7, pp. 2409-2429, July
2020, doi: 10.1109/TNNLS.2019.2945133.
[24] A survey on multi-output regression / H.Borchani, G. Varando, C. Bielza, P. Larranaga. // WIREs
Data Mining Knowl Discov 2015, 5:216–233. doi: 10.1002/widm.1157</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>LeCun</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Convolutional</surname>
          </string-name>
          <article-title>Networks for Images, Speech,</article-title>
          and Time-Series / Y. LeCun,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Benigo</surname>
          </string-name>
          .
          <article-title>-</article-title>
          <year>1995</year>
          . - 14 p.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Convolutional Neural Networks, Explained|by Mayank Mishra|Towards Data Science</article-title>
          .
          <source>Towards Data Science</source>
          .
          <year>2020</year>
          . Available online: https://towardsdatascience.com/convolutional-neuralnetworks-explained-9cc5188c4939.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>