<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Information Control Systems &amp; Technologies, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Hyperparameter tuning in the learning of multithreshold neurons</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vladyslav Kotsovsky</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>23</volume>
      <issue>25</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The modification of the online learning algorithm for multi-valued multithreshold neurons is proposed in the paper. Conditions are stated and proved that ensure the finite successful learning. The influence of the algorithm hyperparameters on the learning process is analyzed on the base of simulation results. The recommendations are formulated concerning the choice of values of these hyperparameters, which may significantly reduce the learning time. The experiment results prove that the proposed algorithm and Parberry. Obtained results can be useful in the design of artificial neural network classifiers employing multithreshold activation functions in network nodes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multithreshold neural unit</kwd>
        <kwd>classification</kwd>
        <kwd>machine learning 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Neural networks (NN) became mainstream in modern artificial intelligence (AI) systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
smart data proceeding [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Both hardware [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and software infrastructure of AI [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] widely employs
concepts and solutions based on neural-like approaches [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. Different network architectures [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] as
well as appropriate learning and synthesis techniques [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] provide powerful capacities of artificial
NNs in the solving numerous real-time problems. Modern NN-based AI systems depend on billions
of parameters [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and their behavior is influenced by many hyperparameters [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which are used in
the learning of the underlying machine learning (ML) model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This implies the importance of the
proper choice of these hyperparameters during the training process in order to adopt AI system to
the solution of the given ML problem [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        The tremendous power of latest AI systems is provided by the capability of underling NN [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Therefore, the main efforts in neural computation are devoted to the improvement of the network
capacities [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It can be achieved in many ways [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The most popular one consists in the increasing
of the network size by using deeper models with many neurons in every hidden layer [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], as well as
the application of new hybrid network architectures, e.g., as in [
        <xref ref-type="bibr" rid="ref2 ref5 ref6">2, 5, 6</xref>
        ]. This approach can be
extremely successful, but usually it requires considerable computation resources and may be very
chip and inappropriate in many cases [
        <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
        ].
      </p>
      <p>
        The second approach consists in the use of a relatively small NN enhanced by the application of
modified network nodes, which are more powerful than usual linear neural units with RELU- or
sigmoid-like activation functions [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In simplest cases a single such unit is sufficient to solve a
classification task on small- or medium-sized dataset [
        <xref ref-type="bibr" rid="ref11 ref15">11, 15</xref>
        ].
      </p>
      <p>
        In order to overcome the limitation of classical neural units, many modified models were
proposed, e.g., in [
        <xref ref-type="bibr" rid="ref16 ref2 ref9">2, 9, 16</xref>
        ]. They all were intended to increase the recognition capacity of a single neuron.
As mentioned in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], they can be divided into at least two classes.
      </p>
      <p>
        The first class contains models using the modified modes of the aggregation of the input signals
of the neural unit instead of the usual weighted sum of inputs. This approach includes different
kernel models, which make the shape of decision region of the neural unit more complicated and
more appropriate to the distribution of data patterns [
        <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
        ].
      </p>
      <p>
        The second class consists of models that benefit of the use of a modified activation function [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
This class is sometimes more useful than the first one, because its representatives adopt the kind of
activation to the particular task without adding many new parameters to the ML model [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Note
that this approach requires the development of special learning techniques adapted to the chosen
modification of the activation function [
        <xref ref-type="bibr" rid="ref18 ref9">9, 18</xref>
        ].
      </p>
      <p>
        The current research is devoted to the study of the one kind of neural models belonging to the
second class the multi-valued multithreshold neural unit [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ]. The goal of the research is the
design of the learning algorithm for such multithreshold units and the investigation what values of
algorithm hyperparameters would be used in order to speed-up the training process and improve the
capacity of resulting neuron.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Multithreshold approach was proposed in the early studies in threshold logic [
        <xref ref-type="bibr" rid="ref21">17 19, 21</xref>
        ]. The first
models employed the multithreshold binary-valued activation in order to enhance the capacity of
the classical threshold gate based on the famous McCulloch and Pitts model [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. This enhancement
was theoretically confirmed in [
        <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
        ], where it was shown that a linear threshold unit strengthened
by additional thresholds considerably overperforms single-threshold gate. The explanations and
quantitative expressions of the increase of the unit capacity can be found in [
        <xref ref-type="bibr" rid="ref17 ref24">17, 24</xref>
        ]. Despite the
strict confirmation and justification of the advantage of multithreshold models in pattern
classification, the practical benefits of this approach were almost missing, because few synthesis (as
well as learning) algorithms were proposed for such multithreshold systems. And this, in order,
implies the decline of the interest in the development and the use of multithreshold models and systems
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Hardness results for multithreshold units stated in [
        <xref ref-type="bibr" rid="ref16 ref20">16, 20</xref>
        ] explain that the learning task for a
multithreshold unit is considerably harder in the sense of complexity theory than similar task for a
single-threshold unit. This conclusion was also confirmed for general multithreshold neural units
with an arbitrary number of thresholds in [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Paper [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] also contains the result concerning the
connection between multithreshold neurons and single-threshold neural networks with a single
hidden layer.
      </p>
      <p>
        Nevertheless, in [
        <xref ref-type="bibr" rid="ref26 ref27 ref6">6, 26, 27</xref>
        ] some recent advances were observed in the application of bithreshold
and multithreshold neural units and networks, respectively. In the bithreshold case it was caused by
new approaches in the synthesis of NN by employing bithreshold neurons in hidden layers of
networks [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This approach can be combined with the reducing of drawbacks of bithreshold
activations [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] by making network deeper using hybrid blocks, which consist of group of
heterogeneous neurons preserving the information concerning the location of training patterns [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
The similar approach was proposed in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], where the smoothed modification of activation function
was used as well as neuron center defined by the portion of training patterns, which activate this
neuron.
      </p>
      <p>
        In the multithreshold case the progress is related to the use of multi-valued outputs instead of
binary ones [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. This leads to lesser complexity of the learning task compared to the case of the
application of binary-valued neurons, because the complexity of the learning of multi-valued
multithreshold neurons proved to be equal to the complexity of the learning of linear single-threshold
units [
        <xref ref-type="bibr" rid="ref29 ref30">29, 30</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Models and methods</title>
      <p>The multithreshold multi-valued model of the neuron will be considered in this section as well as
issues related to its learning.</p>
      <sec id="sec-3-1">
        <title>3.1. Model of multi-valued multithreshold neural unit</title>
        <p>
          Consider a model of multithreshold neuron. It is a computation unit provided with weight vector
w = ( w1, , wn )  Rn and ordered threshold vector t = (t1, ,tk )  Rk . Each weight wi is associated
with corresponding input xi, i = 1, , n . The use of multiple thresholds allows the neuron to operate
in two modes: binary-valued and multi-valued, respectively [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Further only multi-valued neurons
will be considered. The unit output is denoted by y and is defined in the following way:
0,
1,

y = ........................
        </p>
        <p>if w  x  t1,
if t1  w  x  t2 ,
k −1, if tk−1  w  x  tk ,

k, if tk  w  x,
(1)
where w  x denotes the inner products of the weight vector w and the input vector x.</p>
        <p>It is evident that the neural unit describing by equation (1) has k + 1 different values. Therefore,
we can use it as a single output node of NN classifier in the case when the number of classes is
greater by 1 than the number of thresholds.</p>
        <p>The pair (w, t) completely defines the multi-valued multithreshold neuron. Further, this pair will
be used as the short -valued multithreshold neuron with weight vector w and
threshold vector t .</p>
        <p>Let A be an arbitrary set of patterns in n-dimensional real space. Every multi-valued k-threshold
neuron (w, t ) induces the ordered partition ( A0 , A1,..., Ak ) of the set , where the set Ai contains all
elements of the set A such that ti  w  x  ti+1, (i = 0, , k ) . Note that two additional
pseudothresholds t0 = − and tk+1 = + were used in the previous equation for convenience.</p>
        <p>
          This partition is called an ordered k-threshold partition of the set A by strongly k-separable sets
A0 , A1,..., Ak [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Notice that the order matters for such partitions.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Learning of multi-valued k-threshold neural unit</title>
        <p>
          Two algorithms for single multi-valued multithreshold neuron were proposed in [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. This
subsection contains a brief description of the modification of the first one.
        </p>
        <p>Consider the search for a multi-valued k-threshold neuron (w, t ) that performs the desired
ordered partition ( A0 , A1,..., Ak ) of the finite set A. We can consider the elements of the set A as
members of our training set. It is evident that without loss of generality one can replace all non-strict
inequalities in (1) by strict one (this is true, because A is finite). Furthermore, it is also easy to show
that the learning task is equivalent to the solution of the following system of linear inequalities:
t0  w  x  t1, if x  A0 ,
t  w  x  t2 , if x  A1,
 1



tk−1  w  x  tk , if x  Ak−1,
tk  w  x  tk+1, if x  Ak .
(2)</p>
        <p>Note that similar as in the definition of ordered partition, two additional sentinel thresholds
t0 = − and tk+1 = + were used in (2) in order to simplify notations. There exists, in addition,
MLlike interpretation of the solution of (2). We can consider it as the task of supervised learning on the
dataset consisting of training pairs (x, yx ) , where x  A, yx = i if and only if x  Ai .</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2.1. Data preprocessing</title>
        <p>
          Consider the method of the transformation of the task (2) to the solution of the homogenous system
of linear inequalities in n + k variables w1,...wm ,t1,...,tk , which was proposed in [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ].
        </p>
        <p>Let us search for solution vector in the form v = ( w1,..., wn , −t1,..., −tk ) , which contains all sough
weights as well as all negated thresholds. Consider the sequence of transformations f j : Rn → Rn+k
It follows from (3) that every chained inequality t j  w  x  t j+1 in (2) is equivalent to the following
system:</p>
        <p>Thus, it is possible to reduce (2) to the solution of homogenous system:
 f j (x)  v  0,

− f j+1 (x)  v  0.
where vectors bi are obtained using (3) and (4), (i = m). Note that in the case of the use of
pseudo-thresholds t0 = − and tk+1 = + (5) consists of exactly 2|A| inequalities, where |A| is a
cardinality of the set A, but we can pseudo-thresholds. Thus,
the actual value of m is 2 A − A0 − Ak . Let V(B) denotes the set of all solution of (5).</p>
        <p>
          The reduction process was described in detail in [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], where the corresponding function
ReduceSet ( A0 , A1, , Ak ) was defined, which returns the set b1, ,bm .
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.2. Online algorithm with shift</title>
        <p>
          Consider the online-version of the learning algorithm for a multi-valued k-threshold neural unit. The
idea of this algorithm is from [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] and it actually derives many steps of relaxation algorithm
for systems of linear inequalities [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The pseudocode of this algorithm is shown below:
ShiftedMultithreshold ( A0 , A1, , Ak , r, , v0 , , d )
1
        </p>
        <p>B  NormalizedSet ( A0 , A1, , Ak )
2 v  v0
3 (i, j, err )  (0, 0,1)
4 while i  r and err  0 :
5 err  0
6 shuffle B
7 for b in B:
8 s  b  v
9 if s  0 :
10 continue
11 j  j + 1
12 err  err + 1
13 v  v + ( j )( d − s )b
14 i  i + 1
15 w  (v1,..., vn )
16 t  (−vn+1,..., −vn+k )
17 return w, t</p>
        <p>Above algorithm has the single parameters ( A0 , A1,..., Ak ) an ordered partition consisting of
strongly k-separable sets. Algorithm has also five hyperparameters: r the upper bound on the
number of learning epochs, a binary value defining the learning mode, v0  R n+k an initial
approximation,  the schedule function defining the value of the learning rate, and d
nonnegative real value, which is a measure of the shift used during each correction. They all are used in
crucial step 13, where the correction of the vector v is performed. The learning process continue
until we find such vector v  Rn+k that all inequalities in (5) are satisfied. If it is not true, then there
exists a vector b such that b  v  0 . This vector b is used in step 13 in order to improve the current
value of the vector v by nudging it in the direction of b. Note that inner product is used in this step
to define the correction step as well as shift d and the current value of the learning rate.</p>
        <p>
          It should be mentioned that considered algorithm differs from the similar algorithm from [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]
only in steps 1 and 13, respectively. In step 1 and additional preprocessing transformation is
performed, which consists in the normalization of the elements of the set B in order to obtain the set
of vectors with unit Euclidean norm. The proposed modification of learning algorithm uses also an
additional hyperparameter d in step 13, which should be non-negative. This allows us to avoid the
possible convergence to the point lying on the bounding surface of the set V(B). Notice also that the
correction is performed only in the case s  0 . Hence, during every iteration performed in steps
713 by all elements of the set B the value of d − s is always equal to d + s , if correction step 13
was reached.
        </p>
        <p>The issues related to the convergence of the above algorithm will be considered in the next
subsection.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.3. Convergence conditions for learning algorithms</title>
        <p>Let us consider theoretical foundation of the above algorithm ensuring its convergence and even
finiteness.</p>
        <p>Proposition. If finite sets A0 , A1,..., Ak are strongly k-separable,
 ( j ) =1 ( j ) +</p>
        <p>2 ( j )
d + b j  v j−1 ,
0 1 ( j )  2, 0  2 ( j )  max , 1 ( j ) + 2 ( j )  min ,
where  min and  max are arbitrary positive constants, then there exists r such that after at most r
corrections ShiftedMultithreshold yields a multi-valued k-threshold neuron (w, t ) , which produces
the partition ( A0 , A1,..., Ak ) .
where b j is a train vector used in jth correction, v j−1 is the value of sought vector v after previous
correction and
r
 2 ( j )
lim j=1
r→  r 
  ( j ) 
 j=1 </p>
        <p>
          In the first case the learning process is similar to the classical perceptron learning with the
learning rates d ( j ) used in jth correction or its extension in the case of multi-valued
multithreshold functions proposed by Obradovi and Parberry (see [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]). It is well known [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] that
the equality
(6)
(7)
(8)
is the sufficient condition of the finiteness of the learning. Let us prove that (8) follows from the
correction rule in step 13 and conditions (6), (7).
r
Prove first that the sequence ( Sr )rN is divergent, where Sr =  ( j ) (note that the denominator
j=1
of the fraction in (8) contains squared value of Sr ). Suppose the contrary. Then
b j  v j−1 = b j ( v0 + d (1)b1 +
for some positive constant D, because dot product of unit vectors does not exceed 1.
        </p>
        <p>Therefore,  ( j ) 1 ( j ) + 2 ( j ) min min (1,(d + D)−1 ). This implies that ( Sr ) is divergent.</p>
        <p>d + D
Thus, our assumption about the convergence of the sequence ( Sr ) was wrong. Therefore, in the
conditions of proposition this sequence always diverges.</p>
        <p>Consider the numerator in (8). We can split the corresponding sum into two parts:
r
 2 ( j ) =
j=1
  2 ( j ) +
j: ( j)1
  2 ( j ).</p>
        <p>j: ( j)1
Let Sr be the first sum in the previous equation. It is evident that
r
Sr =   2 ( j )    ( j )   ( j ) = Sr .</p>
        <p>j: ( j)1 j: ( j)1 j=1
Therefore,</p>
        <p>S S 1</p>
        <p>Srr2  Srr2 = Sr r→→ 0 .</p>
        <p>Consider Sr the second sum in the corresponding equation.</p>
        <p>Sr = j: (j)11 ( j ) + d +b2 (j j v) j−1 2  j: (j)1 2 +  mdax 2 = nr  2 +  mdax 2 ,
where nr is the number of terms in Sr . If for all r numbers nr are bounded by nmax  N , then
Otherwise, let us estimate Sr2 :</p>
        <sec id="sec-3-5-1">
          <title>Hence,</title>
        </sec>
        <sec id="sec-3-5-2">
          <title>Therefore,</title>
          <p>lri→m SSrr2 =  2 + mdax 2 lri→nmmaSxr2 = 0 .</p>
          <p> r 2  r 2</p>
          <p>Sr2 =  j=1 ( j )    j: (r)1 ( j )   nr2.</p>
          <p>S
lri→m Srr2  lri→m</p>
          <p>2
nr  2 + max 

 d  =  2 +  max  lim 1
nr2  d  r→ nr</p>
          <p>= 0.</p>
          <p>lri→m S1r2 jr=1 2 ( j ) = lri→m  SS1r22 + SSr2  = 0 ,
and (7) holds.</p>
          <p>Consider now the case  = 1. Let us prove that the sequence ( v j )
v j − v  v j−1 − v ,
(9)
for all v V ( B ). It is evident that (8) is equivalent to v j − v 2  v j−1 − v 2 . Since</p>
        </sec>
        <sec id="sec-3-5-3">
          <title>Fejér condition (9) is satisfied if</title>
          <p>for all v V ( B ).</p>
          <p>v j − v 2 = v j − v j−1 2 + 2( v j − v j−1 )  ( v j−1 − v) + v j−1 − v 2 ,</p>
          <p>v j − v j−1 2 + 2( v j − v j−1 )  ( v j−1 − v)  0
We can rewrite the step 13 of the learning algorithm in the following way:</p>
          <p>v j = v j−1 + ( j )(d − b j  v j−1 )b j .</p>
          <p>Therefore, it is possible to rewrite the last inequality as follows:</p>
          <p> 2 ( j )(d − b j  v j−1 )2 b j 2 + 2 ( j )(d − b j  v j−1 )b j  ( v j−1 − v)  0.</p>
          <p>Remember that b j  v j−1  0 in every correction. Thus, d − b j  v j−1 = d + b j  v j−1 and the last
quadratic inequality holds only if
We can rewrite this inequality in a following form:
0  ( j ) 
2( v − v j−1 )  b j
d + b j  v j−1</p>
          <p>.</p>
          <p>
0  ( j )  21 +


v  b j − d </p>
          <p>
            .
d + b j  v j−1 

Let us slightly relax Fejér condition from the whole set V(B) to its own subsets. By using the
techniques describing in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], it is easy to verify that for every d  0 and every   0 the cone V(B)
contains such point v = v (d , ) that, the unit closed ball B1 v = x  Rn+k : x − v  1 is the subset
of V(B) and for all x  B1 v v  x  d +  . Therefore, it follows from (11) that sequence ( v j ) satisfies
Fejér condition (9) for the ball B1  v if
          </p>
          <p>
0  ( j )  21 +

</p>
          <p> .
d + b j  v j−1 

Let  = max / 2 . If (6) and (7) are satisfied, then (12) holds.</p>
          <p>
            Suppose that the learning process is infinite, i.e., for all r ShiftedMultithreshold is unable to
produce v j V ( B) for some j  r . Then the sequence ( v j ) satisfies is Fejér condition (9) for
the ball B1  v and, hence, is convergent by well-known fact from the theory of linear normed spaces
[
            <xref ref-type="bibr" rid="ref9">9</xref>
            ].
          </p>
          <p>Consider the increment vectors v j = v j − v j−1 . It follows from (6), (7), (10) and (12) that

v j = 1 ( j ) +

 2 ( j ) </p>
          <p>( d + b j  v j−1 )b j = (1 ( j )(d + b j  v j−1 ) + 2 ( j ))b j .
d + b j  v j−1 </p>
          <p>
It implies
v j = (1 ( j )(d + b j  v j−1 ) +2 ( j )) b j min min ((d + b j  v j−1 ),1) 1 min min (d ,1).
(10)
(11)
(12)</p>
          <p>It follows from the last equation that increment vectors do not go to zero as j goes to infinity.
Therefore, the sequence ( v j ) is not convergent. This apparent contradiction completes the proof in
the case  = 1 .</p>
          <p>
            Note that in the case  = 1 convergence conditions (without the proof) appeared for the first time
in [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>In the above theoretical study of the issues related to the algorithm convergence and finiteness the
range of feasible values for the learning rate hyperparameter was found, but proved Proposition does
not suggest what values are preferable in order to ensure the faster convergence.</p>
      <p>This question can be clarified by empirical study of the dependence of ShiftedMultithreshold on
different strategies to the choice of the values of hyperparameters, which are used in this algorithm.</p>
      <p>
        During simulation k-threshold neurons were trained for different 2  k  10 . This range of values
was chosen in accordance with recommendation from the paper [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]. Randomly generated
kthreshold neuron was used to produce a partition ( A0 , A1,..., Ak ) of the set A containing M uniformly
distributed point from n-dimensional hypercube [
        <xref ref-type="bibr" rid="ref1">−1,1</xref>
        ]n , where cartesian product is used for the
power in the previous formula. Two series of experiments were performed. In the first series whole
A was used as training set. In the second A was randomly split into training set and test set, where
test set contained 20% of all points. The first series of experiment was more intensive. Only it was
used to determine values of last four hyperparameters of algorithm, which then was used in the
second series. For this reason, the most part of this section is devoted to the description of the
experiments of the first type.
      </p>
      <p>Note that the value of r was not studied in the first series of experiments and constant upper
bound 100,000 was used for the number of learning epochs in the first experiment. The reaching of
this bound during learning considered as signal that algorithm failed to learn neuron to solve a given
task. In the next experiments r was reduced to 1,000.</p>
      <p>The final value of the counter j, which corresponds to the total number of corrections performed
during the learning process, was considered as the performance metric. Therefore, the further</p>
      <p>X performed better than Y ber of corrections in the
case of X was lesser by 30% than the number of corrections in the case of Y</p>
      <p>The general tendence during the first series of experiments remained the same for every value of
k from the above-mentioned range. For this reason, results will be presented only for a single value
of k, namely, k = 3 . This implies that 4-valued units will be considered.</p>
      <p>The dimension of the feature space n was chosen to be 50. Different sizes of the training set were
tried. In the next section results for M from {256, 512, 1024, 2048, 4096} will be presented. Random
sampling was used. Each experiment was repeated for 110 times (more precisely, 11 random
partitioning were performed for every of 10 randomly chosen sets A) and 5 best and 5 worst results
were rejected in order to avoid outliers. The remaining 100 results were averaged. The obtained
means will be analyzed in the next section.</p>
      <p>
        The first experiment consists in the estimation of the influence of the value of binary
hyperparameter to the performance of the learning algorithms with random initial approximation (more
precisely, random uniformly distributed in ( 1, 1) numbers were used as coordinates of v0 ). The
constant learning rate  = 2 was used for both possible values of hyperparameter . This value is
suggested by [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] as recommended in the case of relaxation-based algorithms. Note that application
of any constant learning rate means that for all j 1 ( j ) = ,  2 ( j ) = 0 , because otherwise it follows
from (6) that  ( j ) depends on j. The case  = 0 corresponds to the fixed increment used in classical
perceptron-like models. The opposite case  = 1 leads to relaxation-like learning in which the
increment in the jth correction is adaptive and depends on the classification error on the current
training pattern b j measured by b j  v j−1 in (10). It was observed that relaxation-like approach to
the learning considerably overperformed the perceptron-like one in the online learning of 4-valued
3-threshold neurons. The grid search in segment [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] was also performed with the step 0.01 in the
case  = 0 , but it did not make significant impact to the difference of learning times for both
abovementioned types of the increment (actually, the change of influenced the performance for
relaxation-like mode much stronger than for perceptron-like one). Therefore, the perceptron-like
approach to online learning was rejected, the value  = 1 was fixed, and, consequently, only
relaxation-like online learning studied in all next experiments. During the second simulation the
choice of initial approximation was considered alongside with constant learning rate. The learning
with v0
randomly chosen was compared with optimized initial approximation
+ bm ) / m , where m = B . Both the idea and justification of such approximation are
from [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]. The idea of the use of v0 is suggested by the fact that its coordinates have signs and
ordering similar to same characteristics of coordinates of feasible solution from the set V(B). For all
considered M and
      </p>
      <p>results for v0 were on average at least twice as good as for a random v0 . For
this reason, only v0 was used further. Next simulation was devoted to the search of the appropriate
values for the first term 1 ( j ) of the learning rate in (6). In order to reduce the impact of the second
term in (6)  2 ( j ) = 0 was used here. The quite simple constant schedule strategy was used, i.e.,
was assigned to 1 ( j ) (and, consequently, to  ( j ) ) in every correction step. None of tried outside
the segment [1.23, 2.31] was successful and only  [1.5, 2.2] performed well. For this reason, the grid
search on [1.5, 2.2] with the step 0.001 was used to determine (M) empirically the best for the given M.
Further simulation was devoted to the search of the appropriate values for the second term  2 ( j )
of the learning rate in (6). It was observed that only constant  2 ( j ) = 2 0.1, 0.5 provided good
performance. Another grid search was performed on two dimension to find (1(M ) ,2(M ) )
empirically the best pair for the given M. In the next simulation the impact of the value of the shift
hyperparameter d was studied. The learning rate was calculated by using (6) and pairs (1(M ) ,2(M ) )
from the previous experiment. The last simulation was the second series of experiments. Previously
found values of hyperparameters were tried to solve the classification task on the split dataset in
order to estimate the generalization ability of multi-valued k-threshold neuron in the case of different
2  k  10.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and discussion</title>
      <p>Consider results that were obtained in above-mentioned experiments. Table 1 contains comparative
results of perceptron-like ( = 0) and relaxation-like ( = 1) learning mode, respectively, in the
case of the learning of 3-threshold neuron with constant learning rate 2.</p>
      <p>It is evident from Table 1 that the learning mode has great impact on the performance.</p>
      <p>The single adaptive correction (10) allows to move vector v in the right half-space in accordance
to the violated inequality b j  v j−1 instead of numerous fixed increments in the direction bj, which
are necessary for perceptron. Thus, we obtained the empirical proof of the significant advantage of
relaxation approach in the online learning of k-threshold neurons. Consider results concerning the
impact of optimized initial approximation. They were presented in Table 2 also in the case of the
learning of 3-threshold neuron with  = 2 . It is evident from Table 1 and Table 2 that the optimized
initial approximation can at least halve the number of corrections. Thus, it provides the important
improvement of the learning process.</p>
      <p>Consider performance results in the case of different constant values of learning rate. In Table 3
the best value of the learning rate for every dataset size is shown, which was found using the grid
search, as well as the average number of corrections for it. Consider learning in more general case
when constant pairs ( 1, 2) were used. Corresponding results are presented in Table 4.</p>
      <p>The final experiment of the first series consists in the study of role of hyperparameter d on the
learning. It was observed that for all datasets the best performance was obtained using d = 0.
Moreover, in the case d  0.1 the learning became considerably slower. Consider the second series
of experiments. Unlike the first series performed only for k = 3 , the second series was consisted in
the learning of multi-valued k-threshold neuron for all 2  k  10 using  = 1 , optimized initial
approximation calculated only on the proper training set, and values of (1, 2 ) from Table 4. The
shift was not performed. Table 5 contains the average percentage of accuracy of a trained k-threshold
neuron that was measured on the test set for every combination of the dataset size and the number
of thresholds.</p>
      <p>The learning mode defined by is extremely significant to the performance of online learning
and its proper value 1 decreases the number of corrections in 10 times.</p>
      <p>The initial approximation also matters. The use of the improved approximation requires
additional calculations but this can reduce the number of corrections in 2 4 times compared
to random initial approximation.</p>
      <p>Constant learning rate in the case 1.93   2.05 is good choice for the relaxation learning.
The best values of the second terms in (6) were quite low compared to the first term.
The variation of the values of 1 and 2 is not so important and could improve the
performance by 6 12%.</p>
      <p>The generalization ability of k-threshold neuron decreases with the growth of k.
The shift hyperparameter d has mainly the theoretical importance as a guarantee of the finite
learning. Its practical application is limited by small values, whereas larger d can significantly
decrease the learning process.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>
        The modification of the online learning algorithm for multi-valued multithreshold neurons has been
considered. It uses the additional preprocessing step as well as new shift parameter d ensuring the
convergence. Conditions has been proved for the first time that guarantee the finiteness of the
learning process. The influence of the algorithm hyperparameters on the behavior of the learning
algorithm has been also studied. The suggestions were stated concerning the preferred values of
hyperparameters, which provided better performance during experiments on synthetic datasets.
Simulation results proved the advantage of relaxation learning mode over perceptron-like one and
testified that the proposed algorithm is able to greatly overperform procedure of
Obradovi and Parberry [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The use of optimized initial solution has also great positive impact on
the performance. Despite the fact that quantitative characteristics of the improvement presented in
the fifth section are not absolute and may vary depending on the dimension of feature space, the
content and the size of a dataset as well as other factors, proposed recommendation could be useful
for ML projects employing NNs designed using multi-valued multithreshold neurons.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.K.</given-names>
            <surname>Venkatesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            <surname>Ramakrishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Havrysh</surname>
          </string-name>
          ,
          <article-title>High-Performance artificial intelligence recommendation of quality research papers using effective collaborative approach</article-title>
          ,
          <source>Systems 11.2</source>
          (
          <year>2023</year>
          ):
          <fpage>81</fpage>
          . doi:
          <volume>10</volume>
          .3390/systems11020081.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Géron</surname>
          </string-name>
          ,
          <article-title>Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems</article-title>
          , 3rd ed.,
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Sebastopol, CA,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.H.</given-names>
            <surname>Houssein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.E.</given-names>
            <surname>Hosney</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.M. Emam</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          <string-name>
            <surname>Younis</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>W.M.</given-names>
          </string-name>
          <string-name>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <article-title>Soft computing techniques for biomedical data analysis: open issues and challenges</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>56</volume>
          (
          <year>2023</year>
          ):
          <fpage>2599</fpage>
          2649.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>I. Izonin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tkachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Mitoulis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Faramarzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Tsmots</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mashtalir</surname>
          </string-name>
          ,
          <article-title>Machine learning for predicting energy efficiency of buildings: a small data approach</article-title>
          , in: Procedia Computer Science, volume
          <volume>231</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>72</fpage>
          <lpage>77</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.procs.
          <year>2023</year>
          .
          <volume>12</volume>
          .173.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Geche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Geche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vashkeba</surname>
          </string-name>
          ,
          <article-title>Synthesis of time series forecasting scheme based on forecasting models system</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>1356</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>121</fpage>
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tkachenko</surname>
          </string-name>
          ,
          <article-title>An integral software solution of the SGTM neural-like structures implementation for solving different Data Mining tasks</article-title>
          , in: S.
          <string-name>
            <surname>Babichev</surname>
          </string-name>
          , V. Lytvynenko (Eds.),
          <source>Lecture Notes on Data Engineering and Communications Technologies</source>
          , volume
          <volume>77</volume>
          , Springer, Cham,
          <year>2022</year>
          , pp.
          <fpage>696</fpage>
          <lpage>713</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Havryliuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hovdysh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tolstyak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chopyak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kustra</surname>
          </string-name>
          ,
          <article-title>Investigation of PNN optimization methods to improve classification performance in transplantation medicine</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3609</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>338</fpage>
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vladov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yakovliev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bulakh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <article-title>Neural network approximation of helicopter turboshaft engine parameters for improved efficiency</article-title>
          ,
          <source>Energies</source>
          <volume>17</volume>
          .9 (
          <year>2024</year>
          ):
          <fpage>2233</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Haykin</surname>
          </string-name>
          ,
          <source>Neural Networks and Learning Machines</source>
          , 3rd ed.,
          <string-name>
            <surname>Pearson</surname>
            <given-names>Education</given-names>
          </string-name>
          , Upper Saddle River, NJ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuchanskyi</surname>
          </string-name>
          et al.,
          <article-title>Gender-related differences in the citation impact of scientific publications and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Anthony</surname>
          </string-name>
          ,
          <article-title>Learning multivalued multithreshold functions</article-title>
          ,
          <source>CDAM Research Report no. LSECDAM-2003-03</source>
          , London School of Economics,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Amirgaliyev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuchanskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Andrashko</surname>
          </string-name>
          ,
          <article-title>Building a dynamic model of profit maximization Eastern-</article-title>
          <source>European Journal of Enterprise Technologies</source>
          ,
          <volume>2</volume>
          .
          <fpage>4</fpage>
          -
          <lpage>116</lpage>
          (
          <year>2022</year>
          ):
          <fpage>22</fpage>
          29.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          et al.,
          <article-title>Sentiment analysis of information space as feedback of target audience for regional e-business support in Ukraine</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3426</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>488</fpage>
          <lpage>513</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajput</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sreenivasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Papailiopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karbasi</surname>
          </string-name>
          ,
          <article-title>An exponential improvement on the memorization capacity of deep threshold networks</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>16</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>12674</fpage>
          <lpage>12685</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Z.-G.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-L.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <article-title>Unitary learning in conditional models for deep optics neural networks</article-title>
          ,
          <source>in: Proceedings of SPIE The International Society for Optical Engineering</source>
          , volume
          <volume>12565</volume>
          ,
          <year>2023</year>
          , no.
          <volume>1256543</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <article-title>Hybrid 4-layer bithreshold neural network for multiclass classification</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3387</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>212</fpage>
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Takiyama</surname>
          </string-name>
          , Multiple threshold perceptron,
          <source>Pattern Recognition 10.1</source>
          (
          <year>1978</year>
          ):
          <fpage>27</fpage>
          30.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , X. Ma,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Analysis of nonseparable property of multi-valued multi-threshold neuron</article-title>
          ,
          <source>in: Proceedings of 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)</source>
          , Hong Kong, China,
          <year>2008</year>
          , pp.
          <fpage>413</fpage>
          -
          <lpage>419</lpage>
          , doi: 10.1109/IJCNN.
          <year>2008</year>
          .
          <volume>4633825</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Haring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Diephuis</surname>
          </string-name>
          ,
          <article-title>A realization procedure for multithreshold threshold elements</article-title>
          ,
          <source>IEEE Transactions on Electronic Computers, EC-16.6</source>
          (
          <year>1967</year>
          ):
          <fpage>828</fpage>
          -
          <lpage>835</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <article-title>Multithreshold neural units and networks</article-title>
          ,
          <source>in: Proceedings of IEEE 18th International Conference on Computer Sciences and Information Technologies</source>
          ,
          <string-name>
            <surname>CSIT</surname>
          </string-name>
          <year>2023</year>
          , Lviv, Ukraine,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          , doi: 10.1109/CSIT61576.
          <year>2023</year>
          .
          <volume>10324129</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Takiyama</surname>
          </string-name>
          ,
          <article-title>The separating capacity of a multithreshold threshold element</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-7</source>
          .1 (
          <year>1985</year>
          ):
          <fpage>112</fpage>
          116.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ashenayi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vogh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.R.</given-names>
            <surname>Sayeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Karimi</surname>
          </string-name>
          , T. Baradaran,
          <article-title>Multiple threshold perceptron using sinusoidal function</article-title>
          ,
          <source>International Journal of Modelling and Simulation 12.1</source>
          (
          <year>1992</year>
          ):
          <fpage>22</fpage>
          26.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <article-title>Feed-forward neural network classifiers with bithreshold-like activations</article-title>
          ,
          <source>in: Proceedings of IEEE 17th International Scientific and Technical Conference on Computer Sciences and Information Technologies</source>
          ,
          <string-name>
            <surname>CSIT</surname>
          </string-name>
          <year>2022</year>
          , Lviv, Ukraine,
          <year>2022</year>
          , pp.
          <fpage>9</fpage>
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Olafsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Abu-Mostafa</surname>
          </string-name>
          ,
          <article-title>The capacity of multilevel threshold function</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>10</volume>
          .2 (
          <year>1988</year>
          ):
          <fpage>277</fpage>
          281.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. M.</given-names>
            <surname>Ma</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z. Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Using three layer neural network to compute multi-valued functions</article-title>
          ,
          <source>in 2007 Fourth International Symposium on Neural Networks, June 3-7</source>
          ,
          <year>2007</year>
          , Nanjing,
          <string-name>
            <given-names>P.R.</given-names>
            <surname>China</surname>
          </string-name>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>III</given-names>
          </string-name>
          , LNCS
          <volume>4493</volume>
          ,
          <year>2007</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.K.</given-names>
            <surname>Venkatesan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Izonin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Periyasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Indirajithu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          , M.T. Ramakrishna,
          <article-title>Incorporation of energy efficient computational strategies for clustering and routing in heterogeneous networks of smart city</article-title>
          ,
          <source>Energies</source>
          <volume>15</volume>
          .20 (
          <year>2022</year>
          ):
          <fpage>7524</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Andrashko</surname>
          </string-name>
          et al.,
          <article-title>A method for assessing the productivity trends of collective scientific subjects based on the modified PageRank algorithm</article-title>
          ,
          <source>Eastern-European Journal of Enterprise Technologies</source>
          ,
          <volume>1</volume>
          .4 (
          <issue>121</issue>
          ) (
          <year>2023</year>
          ):
          <fpage>41</fpage>
          47.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Voityshyn</surname>
          </string-name>
          ,
          <article-title>On the size of weights for bithreshold neurons and networks</article-title>
          ,
          <source>in: Proceedings of IEEE 16th International Conference on Computer Sciences and Information Technologies</source>
          ,
          <string-name>
            <surname>CSIT</surname>
          </string-name>
          <year>2021</year>
          , Lviv, Ukrain,
          <year>2021</year>
          , volume
          <volume>1</volume>
          , pp.
          <fpage>13</fpage>
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>E.</given-names>
            <surname>Baum</surname>
          </string-name>
          ,
          <article-title>On the capabilities of multilayer perceptrons</article-title>
          ,
          <source>Journal of Complexity 4.3</source>
          (
          <year>1988</year>
          ):
          <fpage>193</fpage>
          215.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <article-title>Learning of multi-valued multithreshold neural units</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3688</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>39</fpage>
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>