<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimization of Artificial Neural Network Hyperparameters For Processing Retrospective Information*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>y F. Rog</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Volgograd State Agricultural University</institution>
          ,
          <addr-line>26, Universitetskiy Avenue, Volgograd, 400002, Russian Federation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Volgograd State Technical University</institution>
          ,
          <addr-line>28, Lenina Avenue, Volgograd, 400005, Russian Federation</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Justification of the selection of the architecture and hyperparameters of artificial neural networks (ANN), focused on solving various classes of applied problems, is a scientific and methodological problem. Optimizing the selection of ANN hyperparameters allows you to improve the quality and speed of ANN training. Various methods of optimizing the selection of ANN hyperparameters are known - the use of evolutionary calculations, genetic algorithms, etc., but they require the use of additional software. To optimize the process of selecting ANN hyperparameters, Google Research has developed the KerasTuner software tool. It is a platform for automated search of a set of optimal combinations of hyperparameters. In Kerastuner, you can use various methods - random search, Bayesian optimization, or Hyperband. In the numerical experiments conducted by the author, 14 hyperparameters were varied, including the number of blocks of convolutional layers and the filters forming them, the type of activation function, the parameters of the "dropout" layers, and others. The studied tools demonstrated high efficiency while simultaneously varying more than a dozen optimized parameters of the convolutional network. The calculation time on the Colaboratory platform for the various combined ANN architectures studied, including recurrent RNN networks, was several hours, even with the use of GPU graphics accelerators. For ANN, focused on the processing and recognition of retrospective information, an increase in the quality of recognition was achieved to 80 ... 95%.</p>
      </abstract>
      <kwd-group>
        <kwd>Artificial Neural Network</kwd>
        <kwd>Hyperparameters</kwd>
        <kwd>Retrospective Information</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Neural network technologies are successfully used in solving problems from various
areas of the economy, including industry, agriculture, and medicine [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ].
Monographs and publications in periodicals by F. Scholle, Y. LeCun, Y. Bengio, and G.
Hinton [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3-5</xref>
        ], as well as Russian researchers S. Nikolenko, A. Kadurina, E.
Archangelskaya, I. L. Kashirin, M. V. Demchenko, and A. Sozykin are devoted to
substantiating the choice of architecture and hyperparameters of artificial neural networks
[
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6-9</xref>
        ]. We note a number of publications by Jia Y., Kruchinin D., Bahrampour S.,
devoted to scientific and methodological aspects of ANN design and software
methods for optimizing their training procedures [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10-12</xref>
        ].
      </p>
      <p>The mentioned authors note the problems of justifying the choice of ins
architecture and hyperparameters aimed at solving various classes of applied problems. There
are known methods for optimizing ins hyperparameters, for example, using genetic
algorithms, but this requires writing additional software.</p>
      <p>
        Of particular interest is the publication of L. Li, K. Jamieson, G. DeSalvo, A.
Rostamizadeh, and A. Talwalkar, dedicated to the Keras Tune tool developed by Google
Research to optimize the process of selecting ins hyperparameters [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Keras Tuner
is an easy-to-use hyperparameter optimization platform that solves problems when
searching for a combination of optimal hyperparameters [
        <xref ref-type="bibr" rid="ref14 ref15">14-15</xref>
        ]. As noted in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
"…many of today's state-of-the-art results, such as EfficientNet, were discovered via
sophisticated hyperparameter optimization algorithms". Currently, this tool is part of
the Keras library, but the methodological and applied issues of its application, as well
as the effectiveness of various architectures, have not been sufficiently studied.
      </p>
      <p>The issues of text data analysis, including in natural language (NLP), are
considered in detail by such researchers as B. Bengforth, R. Bilbro, T. Ojeda, H. Palangi, A.
Surkova, I. Chernobaev, who note additional difficulties in processing data in Russian
[16-19].
2</p>
    </sec>
    <sec id="sec-2">
      <title>Materials and methods</title>
      <p>As a convenient software tool for creating software prototypes, the authors used the
popular Python v. 3.7 language. To quickly create a software prototype, they used
Google Colaboratory, a cloud platform from Google designed to distribute machine
learning technologies and deep neural networks. The Colaboratory platform already
has a lot of necessary libraries installed, as well as quite powerful Tesla K80 GPUs
that significantly accelerate the learning process of neural networks.</p>
      <p>Kerastuner was used as a tool for searching optimized hyperparameters. it allows
creating custom instances of the Hyperband class, the parameters of which are shown
in Table 1.</p>
      <p>
        In the Kerastuner Toolkit, you can use Random search, Bayesian optimization, or
HyperBand methods [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>To start the procedure for optimizing the ANN parameters, call the "tuner.search"
method.</p>
      <p>To test the functioning of the hyperparameter search module, you can use the
wellknown Cifar-10 data set, which is built into TensorFlow.</p>
      <p>In order to study the use of the Keras library's Kerastuner tool on the example of a
convolutional ins, software modules for creating a network with hyperparameters that
usually do not change during network training were adapted. You must specify a
function that will provide variation of the necessary hyperparameters.</p>
      <p>These parameters were the number of blocks of convolutional layers and their
filters, the type of activation functions, parameters of regulatory layers “dropout”, types
of Pooling, etc. (Figure 1).</p>
      <p>It is possible to set the initial parameter values (default) from the range of
variation.</p>
      <p>
        After that, we create an instance of the tuner, which uses the “build_model(hp)”
function prepared above for building the model. In the fragment below, the “Hyperband”
class of the optimization algorithm will be used to search for ins hyperparameters.
Note that you can limit the number of ins launches with the max_trials parameter,
which is recommended to be set to the order of several hundred [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>As output values, the module shows the dimension of the search hyperspace and
the values of the variable ANN parameters at the current time, as well as the value of
the "objective" value.</p>
      <p>The iterative process of searching for combinations of parameters is quite lengthy
and requires the use of a GPU. The optimization software module provides variation
of parameters in space with dimension 14.</p>
      <p>Visualization of the main results of optimization of ins parameters using tuner.
search, performed on the Colaboratory platform using GPU graphics accelerators, is
shown in Figure 4, a) ... d).</p>
      <p>The diagrams in Figure 4 on the ordinate axis show the values of “validate
accuracy” achieved by the ins during training on a test sample.</p>
      <p>Diagram a) represents the influence of the number of convolutional network
feature maps on the first layer, diagram b) on the second, diagram c) on the third, and
diagram d) diagram - effect of the number of neurons in the first convolutional hidden
layer.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>The analysis of the diagram shows the influence of the set of basic hyperparameters
of the optimized ANN on the value of its recognition accuracy. The value
objective = 'val_accuracy', calculated from the test sample (Figure 3), was taken as an
estimated indicator. Each of the variants of the influence of variation of individual
hyperparameters presented in Figure 4 is characterized by multimodality, especially
diagrams a) and b), so it is impossible to unambiguously recommend a priori a
combination of preferred values of the studied hyperparameters.</p>
      <p>After completing the procedure for selecting a combination of hyperparameters,
you can get the best options from the models that were found in the search process,
using the “get_best_models” function. It is also possible to view the numerical values
of optimal hyperparameters that were found in the automated search process.</p>
      <p>Note the significant calculation time, which is several hours even when using GPU
graphics accelerators. Optimized neural networks are used to determine the authorship
of natural language text corpora prepared for training.</p>
      <p>It is experimentally established that among the key hyperparameters, the number of
convolutional layers and neurons in them, as well as the parameters of convolutional
layers and their combination, have the greatest influence.
The study of the possibility of automated selection of ins hyperparameters using the
“Kerastuner” tool showed the following.</p>
      <p>The “Kerastuner” tool demonstrated high optimization efficiency while
simultaneously varying one and a half dozen parameters of the convolutional network, but the
counting time on the Colaboratory platform for the studied ANN architectures was
several hours, even with the use of GPU graphics accelerators. For ins focused on
processing and recognizing text information in natural language (NLP), the
recognition quality has been improved to 80...95%.</p>
      <p>Graphical analysis of the influence of variation of individual hyperparameters on
the quality of ins operation revealed multimodality of diagrams for various
combinations of hyperparameters, especially for the number of feature maps of convolutional
layers, so it is impossible to recommend a priori a combination of preferred values of
the studied hyperparameters. For this purpose, it is desirable to perform joint
automated variation of hyperparameters to improve the quality of the ANN operation.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>The reported study was funded by RFBR and EISR according to the research project
No. 20-011-31648\20.
16. Puchkov, A., Dli, M., Kireyenkova, M.: Fuzzy classification on the base of convolutional
neural networks. Advances in Intelligent Systems and Computing, 902, 379-91 (2020).
17. Suvajit, D. et al.: A comparative study of deep learning models for medical image
classification. IOP Conf. Ser.: Mater. Sci. Eng., 263, 042097 (2017).
18. Lomakina, L., Rodionov, V. and Surkova, A.: Hierarchical clustering of text documents</p>
      <p>Automation and Remote Control, 75(7), 1309-15 (2014).
19. Zhevnerchuk, D., Surkova, A., Lomakina, L. and Golubev, A.: Semantic modeling and
structural synthesis of onboard electronics protection means as open information system
Journal of Physics: Conference Series, 1015, 032157 (2018).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Rogachev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Melikhova</surname>
          </string-name>
          , E.:
          <source>IOP Conf. Ser.: Earth Environ. Sci.</source>
          ,
          <volume>403</volume>
          ,
          <issue>012175</issue>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.:
          <article-title>Deep learning</article-title>
          .
          <source>Nature</source>
          ,
          <volume>521</volume>
          (
          <issue>7553</issue>
          ),
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Morozov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalnichenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Proskurin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Mezentseva</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Investigation of forecasting methods of the state of complex it-projects with the use of deep learning neural networks Advances in Intelligent Systems</article-title>
          and Computing,
          <volume>1020</volume>
          ,
          <fpage>261</fpage>
          -
          <lpage>280</lpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gevorkyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demidova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demidova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sobolev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Review and comparative analysis of machine learning libraries for machine learning Discrete and Continuous Models</article-title>
          and Applied Computational Science,
          <volume>27</volume>
          (
          <issue>4</issue>
          ),
          <fpage>305</fpage>
          -
          <lpage>15</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Tahmassebi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>IDEEPLE: deep learning in a flash Proceedings of SPIE - The International Socie</article-title>
          ,
          <volume>106520S</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Tutubalina</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Nikolenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Combination of deep recurrent neural networks and conditional random fields for extracting adverse drug reactions from user reviews</article-title>
          <source>Journal of Healthcare Engineering</source>
          ,
          <volume>9451342</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kashirina</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          et al.:
          <source>J. Phys.: Conf. Ser.</source>
          ,
          <volume>1203</volume>
          ,
          <issue>012090</issue>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Shaikhislamov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sozykin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Voevodin</surname>
          </string-name>
          , V.:
          <article-title>Survey on software tools that implement deep learning algorithms on intel/x86 and Ibm/Power8/Power9 platforms Supercomputing</article-title>
          .
          <source>Frontiers and Innovations</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <fpage>57</fpage>
          -
          <lpage>83</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Sozykin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          et al.:
          <source>Teaching heart modeling and simulation on parallel computing systems Lecture Notes in Computer Science</source>
          ,
          <volume>9523</volume>
          ,
          <fpage>102</fpage>
          -
          <lpage>113</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shelhamer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al.:
          <source>Caffe: Convolutional Architecture for Fast Feature Embedding Proceedings of the 22nd ACM International Conference on Multimedia (Orlando</source>
          , FL, USA, November
          <volume>03</volume>
          -
          <issue>07</issue>
          ,
          <year>2014</year>
          ),
          <fpage>675</fpage>
          -
          <lpage>78</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kruchinin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolotov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kornyakov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          et al.:
          <article-title>Comparison of Deep Learning Libraries on the Problem of Handwritten Digit Classification Analysis of Images, Social Networks and Texts</article-title>
          .
          <source>Communications in Computer and Information Science</source>
          ,
          <volume>542</volume>
          ,
          <fpage>399</fpage>
          -
          <lpage>411</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Bahrampour</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramakrishnan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schott</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          et al.:
          <article-title>Comparative Study of Deep Learning Software Frameworks</article-title>
          , https://arxiv.org/abs/1511.06435 last accessed
          <year>2020</year>
          /10/21.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jamieson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Hyperband</surname>
            :
            <given-names>A Novel</given-names>
          </string-name>
          <string-name>
            <surname>Bandit-Based Approach</surname>
          </string-name>
          to Hyperparameter
          <source>Optimization Journal of Machine Learning Research</source>
          ,
          <volume>18</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>52</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Glushchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrov</surname>
          </string-name>
          , V.:
          <article-title>On comparative evaluation of effectiveness of neural network and fuzzy logic based adjusters of speed controller for rolling mill drive</article-title>
          .
          <source>Studies in Computational Intelligence</source>
          ,
          <volume>799</volume>
          ,
          <fpage>144</fpage>
          -
          <lpage>50</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>O</given-names>
            <surname>'Malley</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Hyperparameter tuning with Keras Tuner</article-title>
          . https://blog.tensorflow.org/
          <year>2020</year>
          /01/hyperparameter-tuning
          <article-title>-with-keras-tuner</article-title>
          .html,
          <source>last accessed</source>
          <year>2020</year>
          /10/21.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>