<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning of Multi-valued Multithreshold Neural Units</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vladyslav Kotsovsky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>State University “Uzhhorod National University”</institution>
          ,
          <addr-line>Narodna Square 3, Uzhhorod, 88000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The issues related to the use of multithreshold neural units in multiclass classification are treated in the paper. Two models of multi-valued k-threshold neurons are considered. Online and offline modifications of the learning algorithm are designed to train multithreshold neuron to solve multiclass classification tasks using simple and fast learning techniques. The conditions are found ensuring the finiteness of the training. The experiment results demonstrate the performance of multithreshold multiclass classifier on real-world datasets compared to some popular classifiers.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multithreshold neuron</kwd>
        <kwd>multi-valued neuron</kwd>
        <kwd>machine learning</kwd>
        <kwd>neural network</kwd>
        <kwd>classification 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Neural-like networks and systems have numerous applications in artificial intelligence [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
intelligent data analysis [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. They are used in modern hardware [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and software [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] tools and
products [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. The amazing capacities of artificial neural networks (ANN) are provided by the
appropriate use of the network architecture [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and related learning techniques [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>
        The synergy between the network architecture, the kind of network nodes and the network
learning (or synthesis) procedures is very important in the practice of neural computations [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Linear neural units with threshold activation functions [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], binary inputs and output were used
in early models [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This kind of computation units was inspired by the models of biological
neurons from the brain study [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. But both the theoretical studies and practical applications
showed the strong limitations of the basic neuron model of McCulloch and Pitts [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ] as well
as difficulties related to the learning of threshold ANN [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ]. In order to overcome
abovementioned limitations and difficulties, many more complicated models of neural devices were
proposed [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. The overall majority of these models employed two ways to increase the
network capacities by enhancing the power of network neurons [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The first is based on the use of
more sophisticated models of the aggregation of the input signals of the neural unit instead of the
classical weighted sum of inputs [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], e.g., polynomial threshold units [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]. The second
approach consists in the use of more complicated activation functions instead of the step function
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] from the Rosenblatt model [16, 17]. Both approaches have their pros and cons discussed in
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14">10–14</xref>
        ]
      </p>
      <p>The multithreshold models were developed under the second approach [18]. One of the
earliest among them was the multithreshold threshold element [19]. Binary multithreshold neuron
with weight vector w = (w1, , wn )Rn and threshold vector t = (t1, ,tk )Rk is the
computation unit with n inputs x1, , xn whose single binary output y is calculated by the following rule:
(1)
where x = ( x1, , xn )Rn is an input vector, w  x = w1x1 + ... + wn xn is the dot product of vectors w
and x (weighted sum of inputs), k / 2 denotes the integer part of number k / 2 , t1  t2  ...  tk ,
t0 = − and tk+1 = + are additional thresholds used for convenience only. Multithreshold
elements outperform single-threshold ones [18, 20], because they are activated when the sum of
weighted inputs is within the one if given disjoint half-open intervals, which are specified by the
ordered sequence of their thresholds [21].</p>
      <p>
        But the increase in the recognition capability of multithreshold is not gratuitous. One must pay
a high price for this, which consists in the difficulty of the learning of such units [
        <xref ref-type="bibr" rid="ref7">7, 22</xref>
        ], because
the respective learning task is NP-hard even in the case of a unit with two thresholds. The
research has two main goals:
• The study of the model of multi-valued multithreshold neuron that should effectively use
the advantages of multiple thresholds, be suitable for the multiclass classification and admits
fairly simple training techniques.
• The development of the learning algorithm for such units and the study of its fitness for
intended applications in classification.
      </p>
      <p>
        The paper has the following structure. First, the works related to the topic of the study will be
reviewed. Then, two models of multithreshold neural units will be considered: binary-valued and
multi-valued, respectively. We will discuss its advantages and consider some downsides related
to the complexity of their learning. In the next section two learning algorithms will be described,
which are designed for the learning of a single k-threshold neuron. For both algorithms the
conditions on the learning rate will be stated, which satisfy the finiteness of the learning in the
case of their application to the learning of strongly k-separable sets. Next, the simulation results
will be treated of the performance of trained multiclass k-threshold neural classifiers in the
comparison with some other popular classifiers provided by Sklearn library [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Finally, two last
sections contain the discussion of obtained results and conclusions.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        The study of multithreshold neural units has a long history [19, 23, 24]. Multithreshold neural
elements were introduced in the early studies in threshold logic [19, 25]. As mentioned above,
the additional thresholds were proposed with intention to increase the capacities of basic
singlethreshold element [19, 26]. Some properties of multithreshold neurons were stated in [22, 25,
26]. These works mostly dealt with the recognition capacity of multithreshold elements [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Issues related to the synthesis of multithreshold devices remained almost untouched, because
few algorithms for training such multithreshold units and networks had been developed [18, 24].
Therefore, the applications of devices using multithreshold approach were almost unknown [27]
despite the better capabilities of multithreshold units compared to the classical linear threshold
units [20, 26]. The hardness results from [
        <xref ref-type="bibr" rid="ref15">15, 22</xref>
        ] can explain these difficulties for the practical
application of bithreshold systems to some extent. Nevertheless, as stated in [
        <xref ref-type="bibr" rid="ref10 ref8">8, 10, 28</xref>
        ], the lack
of learning techniques for multithreshold systems caused the decline of interest in their study.
      </p>
      <p>
        But recent advances in multithreshold logic changed the situation [
        <xref ref-type="bibr" rid="ref14 ref7">7, 14</xref>
        ]. One of the reasons
were new approaches in the synthesis ANN with hidden layers consisting of neurons with
bithreshold activation functions [
        <xref ref-type="bibr" rid="ref14">14, 20</xref>
        ]. They were developed on the base of the generalization of the
Baum’s synthesis algorithm [29] for threshold networks in the case of bithreshold nodes [
        <xref ref-type="bibr" rid="ref14">14, 28</xref>
        ].
      </p>
      <p>
        The advance in the application of so-called bithreshold networks was stated in [
        <xref ref-type="bibr" rid="ref1 ref10">1, 10</xref>
        ], where
such networks were considered as the effective tools, which are capable to solve typical problems
of intellectual data processing and computational intelligence. The limitations and downsides of
the basic bithreshold ANN from [
        <xref ref-type="bibr" rid="ref14 ref7">7, 14</xref>
        ] were stated in [28]. Hybrid models of the multiclass
classifier with heterogenous hidden layers were proposed in [28], where other kinds of neural
units (e.g., WTA and single-threshold) units were used in order to enhance network performance
and reduce its drawbacks. It should be noted that bithreshold ANN can be useful not only in
classifiers. Their potential applications are considerably wider [
        <xref ref-type="bibr" rid="ref2 ref6 ref8 ref9">2, 6, 8, 9</xref>
        ]. E.g., they were mentioned
in design of powerful deep ANN providing the exponential improvement of the memorization
capacity [16]. The bithreshold approach primary was employed for the solution of real-valued
problems [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. But it admits the generalization to the complex domain [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The complex analogs
of bithreshold activation could be proposed [30] that extend the capacity of complex-valued
threshold neural units. This allows the multithreshold approach in the proceeding of data in the
complex domain [17, 28].
      </p>
      <p>
        It should be noted that the above-mentioned advance in the application of multithreshold
systems is actually related to only bithreshold models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The examples of successful application
of general multithreshold models with an arbitrary number of thresholds are unknown [
        <xref ref-type="bibr" rid="ref14">14, 30</xref>
        ].
It became evident that the additional study is necessary before such models can be employed in
machine learning systems [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. One of them was the paper [22], where general k-threshold neural
units were treated in the case k  2 . As was observed in [22], the parity of k has the great
influence to the properties of multithreshold neurons. Moreover, every multithreshold unit can
be realized using a small threshold circuit, and, consequently, every multithreshold network can
be replaced by the equivalent networks consisting solely of bithreshold and threshold nodes [30].
Notice also that unlike the learning of a single threshold linear unit, the learning of a
multithreshold unit proved to be NP-hard [22] confirming the similar result of the intractability of the
learning of a single bithreshold unit [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Notice that all mentioned applications of bithreshold and k-threshold neurons have the binary
outputs [28]. Thus, their employment in the classifiers requires the special shape of the network
output layer with a separate neuron for every class and the using of “one versus all” approach in
the learning or synthesis [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In some cases, a single output multi-valued neuron is preferable
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], because its application results in the network having fewer nodes and weight coefficients.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Models and methods</title>
      <sec id="sec-3-1">
        <title>3.1. Two models of multithreshold neural units</title>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.1. Model of binary-valued k-threshold neuron</title>
        <p>Let us consider again a model of k-threshold binary-valued neuron with the weight vector w and
(ordered) threshold vector t, which output is given by (1). Note that its performance can be
described as follows:
0, if w  x  t1,

1, if t1  w  x  t2 ,

y = ................................................</p>
        <p>
(1 + (−1)k ) / 2, if tk−1  w  x  tk ,
(1 + (−1)k ) / 2, if tk  w  x.
(2)</p>
        <p>Model (2) has a simple geometrical interpretation [22, 26]. The family of parallel hyperplanes
H j : w  x = t j , j 1, ..., k divides the space Rn by k +1 parts, which can be successively labeled
by numbers 0, 1, …, k. All points belonging to “even” parts are attributed as “negative” ones.
Remaining parts are considered as “positive” [22]. The illustration is shown in Figure 1, where the
case n = 2, k = 3 is considered.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.1.2. Model of multi-valued k-threshold neuron</title>
        <p>The multi-valued modification of the model (2) can be considered [18, 23] that keeps the
capacity of the base model and is easier in the training [24]. This multithreshold model uses the
same weight vector w and threshold vector t, but differs in the output range of the neuron. To be
more precise, the range set of k-threshold multi-valued neuron is Zk+1 = 0,1,,..., k , and the
neuron output y satisfies the following condition:
where
y = ft (w  x) ,
0, if x  t1,
1,

ft ( x) = ........................</p>
        <p>if t1  x  t2 ,
k −1, if tk−1  x  tk ,

k, if tk  x.
(3)
(4)</p>
        <p>Consider again the geometrical illustration, now, for the k-threshold multi-valued neuron (3),
(4). As it is shown in Figure 2, the performance of the neuron is also defined by parallel
hyperplanes H j : w  x = t j , j 1, ..., k , which make partition of the space Rn by k +1 parts.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2. Learning algorithms for multithreshold neurons</title>
      </sec>
      <sec id="sec-3-5">
        <title>3.2.1. Initial reduction of the task</title>
        <p>Let A0, A1,..., Ak be strongly k-separable finite sets. Consider the task of the search of a
multivalued k-threshold neuron with structure pair (w,t) that performs the desired partition
( A0 , A1,..., Ak ) of the set A that is the union of (disjoint) sets A0, A1,..., Ak , which satisfies (5).</p>
        <p>Consider how one can reduce the above task to the solution of the homogenous system of
linear inequalities in n + k variables w1,...wm,t1,...,tk . It is possible to rewrite (3)-(5) as follows:</p>
        <p>These parts also are labeled by indices 0, 1, …, k corresponding to the output value of the
(multi-valued) neuron whose activation is given by (4). Notice that same points are used in both
Figure 1 and Figure 2, but their partition by classes differs, because there are only two classes for
binary-valued k-threshold neuron and k +1—for its many-valued counterpart [22].</p>
        <p>The pair (w, t) completely defines the multi-valued multithreshold neuron and is called its
structure pair. Let A be an arbitrary set in Rn . Then every multi-valued k-threshold neuron with
structure pair (w,t) performs the (ordered) partition ( A0 , A1,..., Ak ) of the set А, where:</p>
        <p>Ai = x  A | ft (w  x) = i, i = 0,1,..., k</p>
        <p>This partition is called an ordered k-threshold partition of the set A, whereas sets A0, A1,..., Ak
are called strongly k-separable (compare with [22]). Note that the order matters for the strongly
separated sets. Sets A0, A1,..., Ak are called k-separable, if there exists a permutation  : Zk+1 →
Zk+1 such that sets A (0) , A (1) ,..., A (k) are strongly k-separable [22].</p>
        <p>a j ( x1,..., xn ) = (x1,..., xn ,0,...,0,1,0,...,0).</p>
        <p>j−1 k− j
The chained inequality t j  w  x  t j+1 is equal to the system
The last system can be rewritten in the following way:
Thus, we can reduce system (6) to the following system:
w  x − t j  0,

−w  x + t j+1  0.
a j (x)  v  0,

−a j+1 (x)  v  0.
w  x  t1, if x  A0 ,

t j  w  x  t j+1, if x  Aj (1  j  k ),
w  x  tk , if x  Ak .</p>
        <p>
          Since sets A0, A1,..., Ak are finite and strongly k-separable, system (6) has solutions, which
compose n-dimensional convex set. If all non-strict inequalities in (6) were replaced by strict
ones, then resulting system would also have solutions. Let v = (w1,..., wn ,−t1,...,−tk ) ,
(5)
(6)
(7)
(8)
4
5
6
7
8
9
1
using (7) and (8). Note that there are algorithms solving (9) in polynomial time [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Thus, the
task of the learning of k-threshold multi-valued neuron (3)-(4) is not NP-complete.
        </p>
        <p>The reduction process can be described using the following pseudocode:</p>
        <p>Notice that the transformation (7) is used in steps 3, 6, 7, 9 ensuring the filling of the output
set B.</p>
        <p>Consider the training of the multi-valued k-threshold neural unit to separate finite strongly
kseparable sets A0, A1,..., Ak .</p>
        <p>Let us describe the online learning algorithm for a k-threshold multi-valued neural unit that
uses ReduceSet ( A0, A1, , Ak ) from the previous subsection and an adopted version of the
relaxation algorithm from [31, 32]. The pseudocode of the algorithm is shown in the function
OnlineMultithreshold:</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>OnlineMultithreshold (A0, A1, , Ak , r, v0, )</title>
      <p>ReduceSet ( A0, A1, , Ak )
1 B  
2 for x in A0 :
3 add −a1 (x) into B
for i in 1,...,k −1 :
for x in Ai :
for x in A :</p>
      <p>k
add ak (x) into B
10 return B
add ai (x) into B
add −ai+1 (x) into B</p>
      <sec id="sec-4-1">
        <title>3.2.2. Online learning algorithm</title>
        <p>B  ReduceSet ( A0, A1, , Ak )
v  v0
(i, j,err )  (0,0,1)
while i  r and err  0 :
err  0
shuffle B
for b in B:
s  b  v
if s  0 :</p>
        <p>continue
j  j +1
err  err +1
17 return w,t</p>
        <p>Previous algorithm has four main parameters: (A0, A1,..., Ak ) — an ordered partition
corresponding to strongly k-separable sets, r—the number of learning epochs, v0 — initial
approximation,  —the schedule function that defines the behavior of the learning rate. The above algorithm
uses three internal counters: i that is responsible for learning epochs, j—responsible for learning
corrections, and err—responsible for the unit errors during the current epoch of learning. The
goal of algorithm is the search of a vector v Rn+k such that for all b B the inequality v  b  0
holds. If such vector is already found, then the learning process terminates. Otherwise, the weight
correction occurs in step 13 at least once per epoch. Note that this correction is successful only in
the case s  0 . Thus, a random initial approximation should be used for v0 to avoid the situation
s = 0 during the learning. The following proposition states conditions ensuring the successful
completion of the online learning using above algorithm.</p>
        <p>Proposition 1. If A = A0  A1 ...  Ak , sets A0, A1,..., Ak are finite and strongly k-separable,
where j is a correction step, s(j) is the dot product obtained in step 8 before jth correction,
then there exists r such that OnlineMultithreshold produces a structure pair (w,t) of
multi-valued k-threshold neuron, which satisfies (6) and performs desired partition of the set A.
0  ( j )  2 , 0  min  ( j)  max ,
(10)</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2.3. Offline learning algorithm</title>
        <p>Let us describe the offline approach to the learning of k-threshold multi-valued neural unit. It is
designed using the modification of offline spectral algorithm from [32] adopted to solving the
system (9). Let B = b1,...,bm be a finite subset of Rn+k , and v Rn+k . We will need the following
notations:</p>
        <p>m m
s( B) = bi , gv (b) = sgn ( v  b), gv ( B) = ( gv (b1 ),..., gv (bm )), sv ( B) =  gv (bi )bi , 1 = (1,...,1) .</p>
        <p>i=1 i=1 n+k
Note that both s( B) and sv ( B) belong to Rn+k and vector sv ( B) can be considered as an
analogs of Fourier coefficients of the function gv : B → −1,0,1 . Consider the following algorithm:</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>OfflineMultithreshold (A0, A1, , Ak , r, v0, )</title>
      <p>1</p>
      <p>B  ReduceSet ( A0, A1, , Ak )</p>
      <p>while j  r and gv ( B)  1 :
compute sv ( B)</p>
      <p>Note that OfflineMultithreshold (A0 , A1, , Ak , r, v0 , ) has identical input parameters as its
online counterpart from the previous subsection.</p>
      <p>The following proposition states conditions ensuring the successful completion of the offline
learning using above algorithm.</p>
      <p>Proposition 2. If A = A0  A1 ...  Ak , sets A0 , A1,..., Ak are finite and strongly k-separable,
v ( j −1)  (sv( j) ( B) − s ( B))
where j is a correction step,  ( j ) and  ( j ) satisfy (10), v(j) is a value of vector v after jth
correction, then there exists r such that OfflineMultithreshold (A0 , A1, , Ak , r, v0 , ) produces the
structure pair ( w, t ) of a multi-valued k-threshold neuron, which performs desired k-threshold
partition of the set A.</p>
      <p>Proofs of both propositions are omitted. They can be obtained using reasons similar to [32].</p>
    </sec>
    <sec id="sec-6">
      <title>4. Experiment and results</title>
      <p>
        Consider the capability of our learning algorithms from the previous section to train a
multivalued multithreshold-based classifier to solve the classification problems on some benchmarks.
Let us compare their performance with well-known classification methods, such as classical
perceptron, nearest neighbor classifier, random forest and feed-forward ANN (multilayer
perceptron). Classifiers were compared on the following two real-world datasets: “balance-scale”
(Balance Scale Weight &amp; Distance Database) and “dry-bean” (Dry Bean Dataset) [33, 34] provided
by UC Irvine Machine Learning Repository [35]. The datasets contain 625 and 13611 learning
instances from 3 and 7 classes, respectively [33, 35]. The first dataset has 5 features, the second
one—16 [33]. 25% instances of every dataset were used as the test set, and the rest 75%—as the
training set. In order to obtain consistent results [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the repeated random subsampling
validation [
        <xref ref-type="bibr" rid="ref11">11, 36</xref>
        ] was used. The learning experiments were repeated 500 times for every dataset and
then obtained results were averaged concerning the accuracy on the training and test sets.
      </p>
      <p>Default values of parameters recommended by Scikit-Learn library were used during training
experiments for first four classical classifiers: 5 neighbors for nearest neighbor classifier, 1000
iterations for linear perceptron classifier, unbounded depth for random forest, one hidden layer
with 100 nodes and 200 iterations for multilayer perceptron [36]. The constant learning rate
 = 2 was used for both MultiThreshold algorithm as well as random initial approximations
w0 ,t0 . Datasets are not provided with an ordered partition into classes [35]. So, the classes were
ordered using the alphabetical order induced by their labels. The following table contains results
of experiments.
By analyzing data from Table 1, we can conclude that:
• Both multithreshold algorithms performed well on the relatively easy small 3-class
classification task on balance-scale dataset and the online modification had the second-best
accuracy on the test set.
• Classification on the dry-bean dataset was more difficult task for almost all classifiers
considered during simulation. Learning for both linear perceptron and multilayer perceptron
failed completely. Multi-valued multithreshold neuron yielded by OfflineMultithreshold
performed better than neuron produced by online algorithm and had the best accuracy
among all neural-like models, which were considered. But its accuracy was considerably
worse than in the case of the use of random forest classifier.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Discussions</title>
      <p>Two versions of the learning algorithm for multi-valued multithreshold neurons have been
proposed. The simulation results prove that both algorithms are capable to yield networks, which
are suitable for the solution of classification problems in the case when the number of classes is
relatively small. But the performance of both algorithms decreases in the case when the number
of classes increases. It seems that it is due to at least two reasons.</p>
      <p>
        The first one is the small number of parameters of the multithreshold model compared to
other classifiers, which often use “one versus all” scheme [
        <xref ref-type="bibr" rid="ref11">11, 36</xref>
        ]. It seems that above drawback
can be overcome by using multithreshold networks [29] or more powerful neuron models with
multithreshold activation, e.g., polynomial neurons [23, 30, 32].
      </p>
      <p>The second reason is caused by the nature of the datasets related to majority of classification
problems. They contain training pairs, each of which consists of a pattern and its class label. In
terms of the partition, we deal with an unordered partition while proposed learning algorithms
are designed to process with strongly k-separable sets corresponding to an ordered partition. The
question arises how to convert an unordered partition to an ordered one. The brute force is not
effective due to fast growth of factorial. Numerous heuristics can be used in order to increase the
performance of the multithreshold neurons. This is a problem that deserves a separate
consideration.</p>
    </sec>
    <sec id="sec-8">
      <title>6. Conclusions</title>
      <p>The problem of the application of multithreshold multi-valued neural units has been
considered. These units separate the sets of patterns in n-dimensional vector space using parallel
hyperplanes. This ability allows them to become candidates for computational nodes of
multiclass ANN classifiers. Thus, the development of learning methods for such networks is important.</p>
      <p>The simplest case of this learning problem has been treated, namely, issues concerning the
learning of a single multi-valued multithreshold neuron. Two approaches to the training of
multithreshold neuron have been developed. Both of them require the simple preliminary patterns
transformation in order to reduce a given multiclass task to corresponding binary classification
task. The online version of the learning algorithm is simpler and often faster. The offline
modification performs single correction during each learning epoch, usually is more expensive but often
yield the neuron having a somewhat better accuracy of classification. The conditions have been
stated ensuring the finiteness of the learning process in the case of application of both algorithms
to the training of k-separable sets.
[16] S. Rajput, K. Sreenivasan, D. Papailiopoulos, A. Karbasi, An exponential improvement on the
memorization capacity of deep threshold networks, in: Advances in Neural Information
Processing Systems, volume 16, 2021, pp. 12674–12685.
[17] Z.-G. Zhang, Y.-L. Xiao, J. Zhong, Unitary learning in conditional models for deep optics neural
networks, in: Proceedings of SPIE – The International Society for Optical Engineering,
volume 12565, 2023, no. 1256543.
[18] N. Jiang, Z. Zhang, X. Ma, J. Wang, Y. Yang, Analysis of nonseparable property of multi-valued
multi-threshold neuron, in: Proceedings of 2008 IEEE International Joint Conference on
Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China,
2008, pp. 413-419, doi: 10.1109/IJCNN.2008.4633825
[19] D. R. Haring, Multi-threshold threshold elements, IEEE Transactions on Electronic
Computers EC-15.1 (1966): 45–65.
[20] I. Prokíc, Characterization of multiple-valued threshold functions in the
Vilenkin-Chrestenson basis, Journal of Multiple-Valued Logic and Soft Computing 34.3-4 (2020): 223–238.
[21] R. Takiyama, The separating capacity of a multithreshold threshold element, IEEE
Transactions on Pattern Analysis and Machine Intelligence. PAMI-7.1 (1985): 112–116.
[22] V. Kotsovsky, A. Batyuk, Multithreshold neural units and networks, in: Proceedings of IEEE
18th International Conference on Computer Sciences and Information Technologies, CSIT
2023, Lviv, Ukraine, 2023, pp. 1-5, doi: 10.1109/CSIT61576.2023.10324129.
[23] N. Jiang, Y. X. Yang, X. M. Ma, and Z. Z. Zhang, Using three layer neural network to compute
multi-valued functions, in 2007 Fourth International Symposium on Neural Networks, June
3-7, 2007, Nanjing, P.R. China, Part III, LNCS 4493, 2007, pp. 1-8.
[24] M. Anthony, Learning multivalued multithreshold functions, CDMA Research Report No.
LSE</p>
      <p>CDMA-2003-03, London School of Economics, 2003.
[25] V.K. Venkatesan, I. Izonin, J. Periyasamy, A. Indirajithu, A. Batyuk, M.T. Ramakrishna,
Incorporation of energy efficient computational strategies for clustering and routing in
heterogeneous networks of smart city, Energies 15.20 (2022): 7524.
[26] S. Olafsson, Y. S. Abu-Mostafa, The capacity of multilevel threshold function, IEEE
Transactions on Pattern Analysis and Machine Intelligence 10.2 (1988): 277–281.
[27] I. Izonin, B. Ilchyshyn, R. Tkachenko, M. Gregus, N. Shakhovska, C. Strauss, Towards data
normalization task for the efficient mining of medical data, in: Proceedings of 12th International
Conference on Advanced Computer Information Technologies, ACIT 2022, Ruzomberok,
Slovakia, 2022, pp. 480–484.
[28] V. Kotsovsky, “Hybrid 4-layer bithreshold neural network for multiclass classification,” in</p>
      <p>CEUR Workshop Proceedings, volume 3387, 2023, pp. 212–223.
[29] E. B. Baum, On the capabilities of multilayer perceptrons, Journal of Complexity 4.3 (1988):
193–215.
[30] V. Kotsovsky, A. Batyuk, V. Voityshyn, On the size of weights for bithreshold neurons and
networks, in: Proceedings of IEEE 16th International Conference on Computer Sciences and
Information Technologies, CSIT 2021, Lviv, Ukrain, 2021, volume 1, pp. 13–16.
[31] S. Dasgupta, S. Sabato, Robust learning from discriminative feature feedback, in: Proceedings
of Machine Learning Research, volume 108, 2020, pp. 973–982.
[32] V. Kotsovsky, A. Batyuk, On-line relaxation versus off-line spectral algorithm in the learning
of polynomial neural units, in: S. Babichev et al., (Eds.), Communications in Computer and
Information Science, volume 1158, Springer, Cham, 2020, pp. 3–21.
[33] OpenML: A worldwide machine learning lab, 2024. URL: https://openml.org.
[34] M. Lupei, M. Shlahta, O. Mitsa, Y. Horoshko, H. Tsybko, V. Gorbachuk, Development of an
interactive map within the implementation of actual state and public directions, in:
Proceedings of the 12th International Conference on Advanced Computer Information
Technologies, ACIT 2022, Ruzomberok, Slovakia, 2022, pp. 384–387.
[35] M. Kelly, R. Longjohn, K. Nottingham, The UCI machine learning repository, 2023. URL:
http://archive.ics.uci.edu.
[36] S. Pölsterl, Scikit-survival: A library for time-to-event analysis built on top of Scikit-learn,
Journal of Machine Learning Research 21 (2020): 1–6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.K.</given-names>
            <surname>Venkatesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            <surname>Ramakrishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Havrysh</surname>
          </string-name>
          ,
          <article-title>High-Performance artificial intelligence recommendation of quality research papers using effective collaborative approach</article-title>
          ,
          <source>Systems 11.2</source>
          (
          <year>2023</year>
          ):
          <fpage>81</fpage>
          . doi:
          <volume>10</volume>
          .3390/systems11020081.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>I. Izonin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tkachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Mitoulis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Faramarzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Tsmots</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mashtalir</surname>
          </string-name>
          ,
          <article-title>Machine learning for predicting energy efficiency of buildings: a small data approach</article-title>
          , in: Procedia Computer Science, volume
          <volume>231</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>77</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.procs.
          <year>2023</year>
          .
          <volume>12</volume>
          .173.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Geche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <article-title>Synthesis of the integer neural elements</article-title>
          ,
          <source>in: Proceedings of the International Conference on Computer Sciences and Information Technologies</source>
          ,
          <string-name>
            <surname>CSIT</surname>
          </string-name>
          <year>2015</year>
          , Lviv, Ukraine,
          <year>2015</year>
          , pp.
          <fpage>121</fpage>
          -
          <lpage>136</lpage>
          . doi:
          <volume>10</volume>
          .1109/STC-CSIT.
          <year>2015</year>
          .
          <volume>7325432</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mitsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Repariuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharkan</surname>
          </string-name>
          ,
          <article-title>Identification of authorship of Ukrainian-language texts of journalistic style using neural networks</article-title>
          ,
          <source>Eastern-European Journal of Enterprise Technologies 1</source>
          <volume>.2</volume>
          (
          <issue>103</issue>
          ) (
          <year>2020</year>
          ):
          <fpage>30</fpage>
          -
          <lpage>36</lpage>
          . doi:
          <volume>10</volume>
          .15587/
          <fpage>1729</fpage>
          -
          <lpage>4061</lpage>
          .
          <year>2020</year>
          .
          <volume>195041</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Havryliuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hovdysh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tolstyak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chopyak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kustra</surname>
          </string-name>
          ,
          <article-title>Investigation of PNN optimization methods to improve classification performance in transplantation medicine</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3609</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>338</fpage>
          -
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>O.</given-names>
            <surname>Mitsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharkan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Maksymchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Varha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shkurko</surname>
          </string-name>
          ,
          <article-title>Ethnocultural, educational and scientific potential of the interactive dialects map</article-title>
          ,
          <source>in: Proceedings of 2023 IEEE International Conference on Smart Information Systems and Technologies (SIST)</source>
          ,
          <year>Astana</year>
          ,
          <year>2023</year>
          , pp.
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
          . doi:
          <volume>10</volume>
          .1109/SIST58284.
          <year>2023</year>
          .
          <volume>10223544</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <article-title>Representational capabilities and learning of bithreshold neural networks</article-title>
          , in: S. Babichev et al. (Eds),
          <source>Advances in Intelligent Systems and Computing</source>
          , volume
          <volume>1246</volume>
          , Springer, Cham,
          <year>2021</year>
          , pp.
          <fpage>499</fpage>
          -
          <lpage>514</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tkachenko</surname>
          </string-name>
          ,
          <article-title>An integral software solution of the SGTM neural-like structures implementation for solving different Data Mining tasks</article-title>
          , in: S.
          <string-name>
            <surname>Babichev</surname>
          </string-name>
          , V. Lytvynenko (Eds.),
          <source>Lecture Notes on Data Engineering and Communications Technologies</source>
          , volume
          <volume>77</volume>
          , Springer, Cham,
          <year>2022</year>
          , pp.
          <fpage>696</fpage>
          -
          <lpage>713</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mitsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharkan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vargha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lupei</surname>
          </string-name>
          ,
          <article-title>Analyzing Ukrainian media texts by means of support vector machines: aspects of language and copyright</article-title>
          , in: Z. Hu.,
          <string-name>
            <surname>I. Dychka</surname>
          </string-name>
          , M. He (Eds.),
          <article-title>Advances in Computer Science for Engineering and Education VI</article-title>
          .
          <source>ICCSEEA 2023, Lecture Notes on Data Engineering and Communications Technologies</source>
          , volume
          <volume>181</volume>
          , Springer, Cham,
          <year>2023</year>
          , pp.
          <fpage>173</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.H.</given-names>
            <surname>Houssein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.E.</given-names>
            <surname>Hosney</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.M. Emam</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          <string-name>
            <surname>Younis</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>W.M.</given-names>
          </string-name>
          <string-name>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <article-title>Soft computing techniques for biomedical data analysis: open issues and challenges</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>56</volume>
          (
          <year>2023</year>
          ):
          <fpage>2599</fpage>
          -
          <lpage>2649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Géron</surname>
          </string-name>
          ,
          <article-title>Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems</article-title>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'Reilly Media</surname>
          </string-name>
          , Sebastopol, CA,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Setoodeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Habibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Haykin</surname>
          </string-name>
          ,
          <source>Nonlinear Filters: Theory and Applications</source>
          , Wiley, New York, NY,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Anthony</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ratsaby</surname>
          </string-name>
          ,
          <article-title>Large-width machine learning algorithm</article-title>
          ,
          <source>Progress in Artificial Intelligence</source>
          <volume>9</volume>
          .3 (
          <year>2020</year>
          ):
          <fpage>275</fpage>
          -
          <lpage>285</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <article-title>Feed-forward neural network classifiers with bithreshold-like activations</article-title>
          ,
          <source>in: Proceedings of IEEE 17th International Scientific and Technical Conference on Computer Sciences and Information Technologies</source>
          ,
          <string-name>
            <surname>CSIT</surname>
          </string-name>
          <year>2022</year>
          , Lviv, Ukraine,
          <year>2022</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Blum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rivest</surname>
          </string-name>
          ,
          <article-title>Training a 3-node neural network is NP-complete</article-title>
          ,
          <source>Neural Networks 5.1</source>
          (
          <year>1992</year>
          ):
          <fpage>117</fpage>
          -
          <lpage>127</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>