=Paper=
{{Paper
|id=Vol-3688/paper4
|storemode=property
|title=Learning of Multi-valued Multithreshold Neural Units
|pdfUrl=https://ceur-ws.org/Vol-3688/paper4.pdf
|volume=Vol-3688
|authors=Vladyslav Kotsovsky
|dblpUrl=https://dblp.org/rec/conf/colins/Kotsovsky24
}}
==Learning of Multi-valued Multithreshold Neural Units==
<pdf width="1500px">https://ceur-ws.org/Vol-3688/paper4.pdf</pdf>
<pre>
                         Learning of Multi-valued Multithreshold Neural Units
                         Vladyslav Kotsovsky1
                         1 State University “Uzhhorod National University”, Narodna Square 3, Uzhhorod, 88000, Ukraine


                                            Abstract
                                            The issues related to the use of multithreshold neural units in multiclass classification are treated in the
                                            paper. Two models of multi-valued k-threshold neurons are considered. Online and offline modifications
                                            of the learning algorithm are designed to train multithreshold neuron to solve multiclass classification
                                            tasks using simple and fast learning techniques. The conditions are found ensuring the finiteness of the
                                            training. The experiment results demonstrate the performance of multithreshold multiclass classifier
                                            on real-world datasets compared to some popular classifiers.

                                            Keywords
                                            Multithreshold neuron, multi-valued neuron, machine learning, neural network, classification 1


                         1. Introduction
                             Neural-like networks and systems have numerous applications in artificial intelligence [1] and
                         intelligent data analysis [2]. They are used in modern hardware [3] and software [4] tools and
                         products [5, 6]. The amazing capacities of artificial neural networks (ANN) are provided by the
                         appropriate use of the network architecture [7] and related learning techniques [8, 9].
                             The synergy between the network architecture, the kind of network nodes and the network
                         learning (or synthesis) procedures is very important in the practice of neural computations [10].
                         Linear neural units with threshold activation functions [11], binary inputs and output were used
                         in early models [12]. This kind of computation units was inspired by the models of biological
                         neurons from the brain study [11]. But both the theoretical studies and practical applications
                         showed the strong limitations of the basic neuron model of McCulloch and Pitts [12, 13] as well
                         as difficulties related to the learning of threshold ANN [14, 15]. In order to overcome above-
                         mentioned limitations and difficulties, many more complicated models of neural devices were
                         proposed [11, 12]. The overall majority of these models employed two ways to increase the net-
                         work capacities by enhancing the power of network neurons [10]. The first is based on the use of
                         more sophisticated models of the aggregation of the input signals of the neural unit instead of the
                         classical weighted sum of inputs [12], e.g., polynomial threshold units [12, 13]. The second ap-
                         proach consists in the use of more complicated activation functions instead of the step function
                         [12] from the Rosenblatt model [16, 17]. Both approaches have their pros and cons discussed in
                         [10–14]
                             The multithreshold models were developed under the second approach [18]. One of the ear-
                         liest among them was the multithreshold threshold element [19]. Binary multithreshold neuron
                         with weight vector w = ( w1 , , wn )  Rn and threshold vector t = ( t1 , , tk )  Rk is the compu-
                         tation unit with n inputs x1 , , xn whose single binary output y is calculated by the following rule:

                                                                1, if t2 j −1  w  x  t2 j , j  1,  k / 2 ,
                                                                                                                                                 (1)
                                                              y=
                                                                0, if t2 j  w  x  t2 j +1 , j  0,1,  k / 2 ,
                                                                


                         COLINS-2024: 8th International Conference on Computational Linguistics and Intelligent Systems, April 12–13, 2024,
                         Lviv, Ukraine
                            vladyslav.kotsovsky@uzhnu.edu.ua (V. Kotsovsky)
                                0000-0002-7045-7933 (V. Kotsovsky)
                                       © 2024 Copyright for this paper by its authors.
                                       Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
where x = ( x1 ,   , xn )  Rn is an input vector, w  x = w1 x1 + ... + wn xn is the dot product of vectors w
and x (weighted sum of inputs),  k / 2 denotes the integer part of number k / 2 , t1  t2  ...  tk ,
t0 = − and tk +1 = + are additional thresholds used for convenience only. Multithreshold ele-
ments outperform single-threshold ones [18, 20], because they are activated when the sum of
weighted inputs is within the one if given disjoint half-open intervals, which are specified by the
ordered sequence of their thresholds [21].
    But the increase in the recognition capability of multithreshold is not gratuitous. One must pay
a high price for this, which consists in the difficulty of the learning of such units [7, 22], because
the respective learning task is NP-hard even in the case of a unit with two thresholds. The
research has two main goals:
    •    The study of the model of multi-valued multithreshold neuron that should effectively use
    the advantages of multiple thresholds, be suitable for the multiclass classification and admits
    fairly simple training techniques.
    •    The development of the learning algorithm for such units and the study of its fitness for
    intended applications in classification.
    The paper has the following structure. First, the works related to the topic of the study will be
reviewed. Then, two models of multithreshold neural units will be considered: binary-valued and
multi-valued, respectively. We will discuss its advantages and consider some downsides related
to the complexity of their learning. In the next section two learning algorithms will be described,
which are designed for the learning of a single k-threshold neuron. For both algorithms the
conditions on the learning rate will be stated, which satisfy the finiteness of the learning in the
case of their application to the learning of strongly k-separable sets. Next, the simulation results
will be treated of the performance of trained multiclass k-threshold neural classifiers in the
comparison with some other popular classifiers provided by Sklearn library [11]. Finally, two last
sections contain the discussion of obtained results and conclusions.

2. Related works
The study of multithreshold neural units has a long history [19, 23, 24]. Multithreshold neural
elements were introduced in the early studies in threshold logic [19, 25]. As mentioned above,
the additional thresholds were proposed with intention to increase the capacities of basic single-
threshold element [19, 26]. Some properties of multithreshold neurons were stated in [22, 25,
26]. These works mostly dealt with the recognition capacity of multithreshold elements [2].
Issues related to the synthesis of multithreshold devices remained almost untouched, because
few algorithms for training such multithreshold units and networks had been developed [18, 24].
Therefore, the applications of devices using multithreshold approach were almost unknown [27]
despite the better capabilities of multithreshold units compared to the classical linear threshold
units [20, 26]. The hardness results from [15, 22] can explain these difficulties for the practical
application of bithreshold systems to some extent. Nevertheless, as stated in [8, 10, 28], the lack
of learning techniques for multithreshold systems caused the decline of interest in their study.
    But recent advances in multithreshold logic changed the situation [7, 14]. One of the reasons
were new approaches in the synthesis ANN with hidden layers consisting of neurons with bithre-
shold activation functions [14, 20]. They were developed on the base of the generalization of the
Baum’s synthesis algorithm [29] for threshold networks in the case of bithreshold nodes [14, 28].
    The advance in the application of so-called bithreshold networks was stated in [1, 10], where
such networks were considered as the effective tools, which are capable to solve typical problems
of intellectual data processing and computational intelligence. The limitations and downsides of
the basic bithreshold ANN from [7, 14] were stated in [28]. Hybrid models of the multiclass
classifier with heterogenous hidden layers were proposed in [28], where other kinds of neural
units (e.g., WTA and single-threshold) units were used in order to enhance network performance
and reduce its drawbacks. It should be noted that bithreshold ANN can be useful not only in clas-
sifiers. Their potential applications are considerably wider [2, 6, 8, 9]. E.g., they were mentioned
in design of powerful deep ANN providing the exponential improvement of the memorization
capacity [16]. The bithreshold approach primary was employed for the solution of real-valued
problems [10]. But it admits the generalization to the complex domain [14]. The complex analogs
of bithreshold activation could be proposed [30] that extend the capacity of complex-valued
threshold neural units. This allows the multithreshold approach in the proceeding of data in the
complex domain [17, 28].
    It should be noted that the above-mentioned advance in the application of multithreshold
systems is actually related to only bithreshold models [7]. The examples of successful application
of general multithreshold models with an arbitrary number of thresholds are unknown [14, 30].
It became evident that the additional study is necessary before such models can be employed in
machine learning systems [10]. One of them was the paper [22], where general k-threshold neural
units were treated in the case k  2 . As was observed in [22], the parity of k has the great
influence to the properties of multithreshold neurons. Moreover, every multithreshold unit can
be realized using a small threshold circuit, and, consequently, every multithreshold network can
be replaced by the equivalent networks consisting solely of bithreshold and threshold nodes [30].
Notice also that unlike the learning of a single threshold linear unit, the learning of a multithre-
shold unit proved to be NP-hard [22] confirming the similar result of the intractability of the
learning of a single bithreshold unit [14].
    Notice that all mentioned applications of bithreshold and k-threshold neurons have the binary
outputs [28]. Thus, their employment in the classifiers requires the special shape of the network
output layer with a separate neuron for every class and the using of “one versus all” approach in
the learning or synthesis [11]. In some cases, a single output multi-valued neuron is preferable
[12], because its application results in the network having fewer nodes and weight coefficients.

3. Models and methods
    3.1. Two models of multithreshold neural units

        3.1.1. Model of binary-valued k-threshold neuron

Let us consider again a model of k-threshold binary-valued neuron with the weight vector w and
(ordered) threshold vector t, which output is given by (1). Note that its performance can be des-
cribed as follows:
                                 0,                     if w  x  t1 ,
                                 
                                 1,                     if t1  w  x  t2 ,
                                 
                             y = ................................................
                                                                                                     (2)
                                 (1 + (−1) ) / 2, if tk −1  w  x  tk ,
                                               k


                                 (1 + (−1) k ) / 2, if t  w  x.
                                                             k


   Model (2) has a simple geometrical interpretation [22, 26]. The family of parallel hyperplanes
H j : w  x = t j , j  1, ..., k  divides the space Rn by k + 1 parts, which can be successively labeled
by numbers 0, 1, …, k. All points belonging to “even” parts are attributed as “negative” ones. Re-
maining parts are considered as “positive” [22]. The illustration is shown in Figure 1, where the
case n = 2, k = 3 is considered.
   Figure 1 can also illustrate the nature of difficulties related to the application of binary-valued
multithreshold neuron. Its value can alternate many times. It ensures the great capability of the
multithreshold unit on the one hand, but results in the hardness of its training on the other hand.
Note that the strict proof of the NP-hardness of the learning of a single binary-valued multi-
threshold neuron can be found in [22].
Figure 1: Illustration of the performance of binary-valued 3-threshold neuron

        3.1.2. Model of multi-valued k-threshold neuron

   The multi-valued modification of the model (2) can be considered [18, 23] that keeps the
capacity of the base model and is easier in the training [24]. This multithreshold model uses the
same weight vector w and threshold vector t, but differs in the output range of the neuron. To be
more precise, the range set of k-threshold multi-valued neuron is Zk +1 = 0,1,,..., k , and the
neuron output y satisfies the following condition:


                                           y = ft ( w  x ) ,                                 (3)
where
                                            0, if x  t1 ,
                                            1,       if t1  x  t2 ,
                                            
                                f t ( x ) = ........................                        (4)
                                             k − 1, if t  x  t ,
                                                          k −1        k

                                             k ,     if tk  x.

   Consider again the geometrical illustration, now, for the k-threshold multi-valued neuron (3),
(4). As it is shown in Figure 2, the performance of the neuron is also defined by parallel hyper-
planes H j : w  x = t j , j  1, ..., k  , which make partition of the space Rn by k + 1 parts.


Figure 2: Illustration of the performance of multi-valued 3-threshold neuron
   These parts also are labeled by indices 0, 1, …, k corresponding to the output value of the
(multi-valued) neuron whose activation is given by (4). Notice that same points are used in both
Figure 1 and Figure 2, but their partition by classes differs, because there are only two classes for
binary-valued k-threshold neuron and k + 1 —for its many-valued counterpart [22].
   The pair (w, t) completely defines the multi-valued multithreshold neuron and is called its
structure pair. Let A be an arbitrary set in Rn . Then every multi-valued k-threshold neuron with
structure pair ( w , t ) performs the (ordered) partition ( A0 , A1 ,..., Ak ) of the set А, where:

                               Ai = x  A | f t ( w  x ) = i , i = 0,1,..., k                                 (5)

   This partition is called an ordered k-threshold partition of the set A, whereas sets A0 , A1 ,..., Ak
are called strongly k-separable (compare with [22]). Note that the order matters for the strongly
separated sets. Sets A0 , A1 ,..., Ak are called k-separable, if there exists a permutation  : Zk +1 →
Zk +1 such that sets A ( 0) , A (1) ,..., A ( k ) are strongly k-separable [22].


    3.2. Learning algorithms for multithreshold neurons

        3.2.1. Initial reduction of the task

Let A0 , A1 ,..., Ak be strongly k-separable finite sets. Consider the task of the search of a multi-
valued k-threshold neuron with structure pair ( w , t ) that performs the desired partition
( A0 , A1 ,..., Ak ) of the set A that is the union of (disjoint) sets A0 , A1 ,..., Ak , which satisfies (5).
   Consider how one can reduce the above task to the solution of the homogenous system of
linear inequalities in n + k variables w1 ,...wm , t1 ,..., tk . It is possible to rewrite (3)-(5) as follows:

                               w  x  t1 ,            if x  A0 ,
                               
                               t j  w  x  t j +1 , if x  Aj (1  j  k ) ,                                  (6)
                               
                                w  x  tk ,          if x  Ak .

  Since sets A0 , A1 ,..., Ak are finite and strongly k-separable, system (6) has solutions, which
compose n-dimensional convex set. If all non-strict inequalities in (6) were replaced by strict
ones, then resulting system would also have solutions. Let v = ( w1 ,..., wn , −t1 ,..., −tk ) ,

                              a j ( x1 ,..., xn ) = ( x1 ,..., xn ,0,...,0,1,0,...,0).
                                                                                                                 (7)
                                                                  j −1        k− j


   The chained inequality t j  w  x  t j +1 is equal to the system

                                                    w  x − t j  0,
                                                     
                                                     −w  x + t j +1  0.
   The last system can be rewritten in the following way:
                                              a j ( x )  v  0,
                                                                                                                (8)
                                               −a j+1 ( x )  v  0.
   Thus, we can reduce system (6) to the following system:
                                            b1  v  0,
                                            
                                            ...............                                   (9)
                                            b  v  0
                                             m
where m = 2 A − A0 − Ak ( X denotes the cardinality of the set X) and vectors bi are obtained
using (7) and (8). Note that there are algorithms solving (9) in polynomial time [13]. Thus, the
task of the learning of k-threshold multi-valued neuron (3)-(4) is not NP-complete.
   The reduction process can be described using the following pseudocode:

   ReduceSet ( A0 , A1 ,   , Ak )
   1 B 
   2 for x in A0 :
   3       add −a1 ( x ) into B
   4   for i in 1,..., k − 1 :
   5       for x in Ai :
   6           add ai ( x ) into B
   7           add −ai+1 ( x ) into B
   8   for x in Ak :
   9     add ak ( x ) into B
   10 return B
   Notice that the transformation (7) is used in steps 3, 6, 7, 9 ensuring the filling of the output
set B.

        3.2.2. Online learning algorithm

Consider the training of the multi-valued k-threshold neural unit to separate finite strongly k-
separable sets A0 , A1 ,..., Ak .
   Let us describe the online learning algorithm for a k-threshold multi-valued neural unit that
uses ReduceSet ( A0 , A1 , , Ak ) from the previous subsection and an adopted version of the relaxa-
tion algorithm from [31, 32]. The pseudocode of the algorithm is shown in the function Online-
Multithreshold:

   OnlineMultithreshold ( A0 , A1 ,     , Ak , r, v0 , )
   1   B  ReduceSet ( A0 , A1 , , Ak )
   2 v  v0
   3 ( i, j, err )  ( 0,0,1)
   4 while i  r and err  0 :
   5       err  0
   6       shuffle B
   7       for b in B:
   8            s bv
   9            if s  0 :
   10               continue
   11            j  j +1
   12           err  err + 1
                                 ( j)s
   13                  vv−            2
                                           b
                                   b
   14   i  i +1
   15 w  ( v1 ,..., vn )
   16 t  ( −vn+1 ,..., −vn+k )
   17 return w, t

   Previous algorithm has four main parameters: ( A0 , A1 ,..., Ak ) — an ordered partition corres-
ponding to strongly k-separable sets, r—the number of learning epochs, v 0 — initial approxima-
tion,  —the schedule function that defines the behavior of the learning rate. The above algorithm
uses three internal counters: i that is responsible for learning epochs, j—responsible for learning
corrections, and err—responsible for the unit errors during the current epoch of learning. The
goal of algorithm is the search of a vector v  R n+ k such that for all b  B the inequality v  b  0
holds. If such vector is already found, then the learning process terminates. Otherwise, the weight
correction occurs in step 13 at least once per epoch. Note that this correction is successful only in
the case s  0 . Thus, a random initial approximation should be used for v 0 to avoid the situation
 s = 0 during the learning. The following proposition states conditions ensuring the successful
completion of the online learning using above algorithm.
   Proposition 1. If A = A0  A1  ...  Ak , sets A0 , A1 ,..., Ak are finite and strongly k-separable,

                                                                                ( j )
                                                         ( j ) = ( j ) +                ,
                                                                               s( j)

where j is a correction step, s(j) is the dot product obtained in step 8 before jth correction,
                                                                  ( j )  max
                                     0    ( j )  2 , 0  min                 ,                                                  (10)

then there exists r such that OnlineMultithreshold produces a structure pair ( w , t ) of multi-va-
lued k-threshold neuron, which satisfies (6) and performs desired partition of the set A.

          3.2.3. Offline learning algorithm

Let us describe the offline approach to the learning of k-threshold multi-valued neural unit. It is
designed using the modification of offline spectral algorithm from [32] adopted to solving the
system (9). Let B = b1 ,..., b m  be a finite subset of Rn+k , and v  R n+ k . We will need the following
notations:
             m                                                                                            m
  s ( B ) =  bi , g v ( b ) = sgn ( v  b ) , g v ( B ) = ( g v ( b1 ) ,..., g v ( b m ) ) , s v ( B ) =  g v ( b i ) b i , 1 = (1,...,1) .
            i =1                                                                                         i =1                       n+k


   Note that both s ( B ) and s v ( B ) belong to Rn+k and vector s v ( B ) can be considered as an
analogs of Fourier coefficients of the function g v : B → −1,0,1 . Consider the following algorithm:

   OfflineMultithreshold ( A0 , A1 ,                 , Ak , r, v0 , )
   1      B  ReduceSet ( A0 , A1 ,            , Ak )
   2     vv       0

   3     compute s(B)
   4     compute g v ( B )
   5      j 0
   6    while j  r and g v ( B )  1 :
   7       j  j +1
   8       compute s v ( B )
                      (i ) (s ( B ) − sv ( B ))  v
   9        vv−                                       (s ( B ) − s ( B ))
                            s ( B ) − sv ( B )
                                                 2                v


   10      compute g v ( B )
   11 w  ( v1 ,..., vn )
   12 t  ( −vn+1 ,..., −vn+k )
   13 return w, t

   Note that OfflineMultithreshold ( A0 , A1 , , Ak , r, v0 , ) has identical input parameters as its
online counterpart from the previous subsection.
   The following proposition states conditions ensuring the successful completion of the offline
learning using above algorithm.
   Proposition 2. If A = A0  A1  ...  Ak , sets A0 , A1 ,..., Ak are finite and strongly k-separable,

                                                                          ( j )
                                     ( j ) = ( j ) +                                               ,
                                                                       (
                                                          v ( j − 1)  s v( j ) ( B ) − s ( B )   )
where j is a correction step,   ( j ) and   ( j ) satisfy (10), v(j) is a value of vector v after jth
                                                                         , Ak , r, v0 , ) produces the
correction, then there exists r such that OfflineMultithreshold ( A0 , A1 ,
structure pair ( w , t ) of a multi-valued k-threshold neuron, which performs desired k-threshold
partition of the set A.
   Proofs of both propositions are omitted. They can be obtained using reasons similar to [32].

4. Experiment and results
   Consider the capability of our learning algorithms from the previous section to train a multi-
valued multithreshold-based classifier to solve the classification problems on some benchmarks.
Let us compare their performance with well-known classification methods, such as classical per-
ceptron, nearest neighbor classifier, random forest and feed-forward ANN (multilayer percep-
tron). Classifiers were compared on the following two real-world datasets: “balance-scale”
(Balance Scale Weight & Distance Database) and “dry-bean” (Dry Bean Dataset) [33, 34] provided
by UC Irvine Machine Learning Repository [35]. The datasets contain 625 and 13611 learning
instances from 3 and 7 classes, respectively [33, 35]. The first dataset has 5 features, the second
one—16 [33]. 25% instances of every dataset were used as the test set, and the rest 75%—as the
training set. In order to obtain consistent results [12], the repeated random subsampling valida-
tion [11, 36] was used. The learning experiments were repeated 500 times for every dataset and
then obtained results were averaged concerning the accuracy on the training and test sets.
   Default values of parameters recommended by Scikit-Learn library were used during training
experiments for first four classical classifiers: 5 neighbors for nearest neighbor classifier, 1000
iterations for linear perceptron classifier, unbounded depth for random forest, one hidden layer
with 100 nodes and 200 iterations for multilayer perceptron [36]. The constant learning rate
 = 2 was used for both MultiThreshold algorithm as well as random initial approximations
w0 , t 0 . Datasets are not provided with an ordered partition into classes [35]. So, the classes were
ordered using the alphabetical order induced by their labels. The following table contains results
of experiments.
Table 1
Simulation results on two real-world datasets
                                  Accuracy on training set (in %)   Accuracy on test set (in %)
 Classifier
                                  balance-scale dry-bean            balance-scale dry-bean
 Perceptron                       84.25            20.91            82.23           16.59
 5-Nearest Neighbor               88.07            81.07            81.85           72.34
 Random Forest                    100              99.98            82.93           90.03
 MLP Classifier                   95.28            53.51            92.74           49.60
 OnlineMultithreshold             88.84            57.23            83.89           51.22
 OfflineMultithreshold            88.03            66.02            82.16           59.83

   By analyzing data from Table 1, we can conclude that:
    • Both multithreshold algorithms performed well on the relatively easy small 3-class classi-
       fication task on balance-scale dataset and the online modification had the second-best
       accuracy on the test set.
    • Classification on the dry-bean dataset was more difficult task for almost all classifiers con-
       sidered during simulation. Learning for both linear perceptron and multilayer perceptron
       failed completely. Multi-valued multithreshold neuron yielded by OfflineMultithreshold
       performed better than neuron produced by online algorithm and had the best accuracy
       among all neural-like models, which were considered. But its accuracy was considerably
       worse than in the case of the use of random forest classifier.

5. Discussions
    Two versions of the learning algorithm for multi-valued multithreshold neurons have been
proposed. The simulation results prove that both algorithms are capable to yield networks, which
are suitable for the solution of classification problems in the case when the number of classes is
relatively small. But the performance of both algorithms decreases in the case when the number
of classes increases. It seems that it is due to at least two reasons.
    The first one is the small number of parameters of the multithreshold model compared to
other classifiers, which often use “one versus all” scheme [11, 36]. It seems that above drawback
can be overcome by using multithreshold networks [29] or more powerful neuron models with
multithreshold activation, e.g., polynomial neurons [23, 30, 32].
    The second reason is caused by the nature of the datasets related to majority of classification
problems. They contain training pairs, each of which consists of a pattern and its class label. In
terms of the partition, we deal with an unordered partition while proposed learning algorithms
are designed to process with strongly k-separable sets corresponding to an ordered partition. The
question arises how to convert an unordered partition to an ordered one. The brute force is not
effective due to fast growth of factorial. Numerous heuristics can be used in order to increase the
performance of the multithreshold neurons. This is a problem that deserves a separate conside-
ration.

6. Conclusions
   The problem of the application of multithreshold multi-valued neural units has been consi-
dered. These units separate the sets of patterns in n-dimensional vector space using parallel
hyperplanes. This ability allows them to become candidates for computational nodes of multi-
class ANN classifiers. Thus, the development of learning methods for such networks is important.
   The simplest case of this learning problem has been treated, namely, issues concerning the
learning of a single multi-valued multithreshold neuron. Two approaches to the training of multi-
threshold neuron have been developed. Both of them require the simple preliminary patterns
transformation in order to reduce a given multiclass task to corresponding binary classification
task. The online version of the learning algorithm is simpler and often faster. The offline modifi-
cation performs single correction during each learning epoch, usually is more expensive but often
yield the neuron having a somewhat better accuracy of classification. The conditions have been
stated ensuring the finiteness of the learning process in the case of application of both algorithms
to the training of k-separable sets.

References
[1] V.K. Venkatesan, M.T. Ramakrishna, A. Batyuk, A. Barna, B. Havrysh, High-Performance arti-
     ficial intelligence recommendation of quality research papers using effective collaborative
     approach, Systems 11.2 (2023): 81. doi:10.3390/systems11020081.
[2] I. Izonin, R. Tkachenko, S. A. Mitoulis, A. Faramarzi, I. Tsmots, D. Mashtalir, Machine learning
     for predicting energy efficiency of buildings: a small data approach, in: Procedia Computer
     Science, volume 231, 2024, pp. 72–77. doi:10.1016/j.procs.2023.12.173.
[3] F. Geche, V. Kotsovsky, A. Batyuk, Synthesis of the integer neural elements, in: Proceedings
     of the International Conference on Computer Sciences and Information Technologies, CSIT
     2015, Lviv, Ukraine, 2015, pp. 121–136. doi:10.1109/STC-CSIT.2015.7325432.
[4] M. Lupei, A. Mitsa, V. Repariuk, V. Sharkan, Identification of authorship of Ukrainian-language
     texts of journalistic style using neural networks, Eastern-European Journal of Enterprise
     Technologies 1.2 (103) (2020): 30–36. doi:10.15587/1729-4061.2020.195041.
[5] M. Havryliuk, N. Hovdysh, Y. Tolstyak, V. Chopyak, N. Kustra, Investigation of PNN optimi-
     zation methods to improve classification performance in transplantation medicine, in: CEUR
     Workshop Proceedings, volume 3609, 2023, pp. 338–345.
[6] O. Mitsa, V. Sharkan, V. Maksymchuk, S. Varha, H. Shkurko, Ethnocultural, educational and
     scientific potential of the interactive dialects map, in: Proceedings of 2023 IEEE International
     Conference on Smart Information Systems and Technologies (SIST), Astana, 2023, pp. 226-
     231. doi: 10.1109/SIST58284.2023.10223544.
[7] V. Kotsovsky, A. Batyuk, Representational capabilities and learning of bithreshold neural
     networks, in: S. Babichev et al. (Eds), Advances in Intelligent Systems and Computing, volume
     1246, Springer, Cham, 2021, pp. 499–514.
[8] R. Tkachenko, An integral software solution of the SGTM neural-like structures implemen-
     tation for solving different Data Mining tasks, in: S. Babichev, V. Lytvynenko (Eds.), Lecture
     Notes on Data Engineering and Communications Technologies, volume 77, Springer, Cham,
     2022, pp. 696–713.
[9] M. Lupei, O. Mitsa, V. Sharkan, S. Vargha, N. Lupei, Analyzing Ukrainian media texts by means
     of support vector machines: aspects of language and copyright, in: Z. Hu., I. Dychka, M. He
     (Eds.), Advances in Computer Science for Engineering and Education VI. ICCSEEA 2023,
     Lecture Notes on Data Engineering and Communications Technologies, volume 181, Sprin-
     ger, Cham, 2023, pp. 173–182.
[10] E.H. Houssein, M.E. Hosney, M.M. Emam, E.M. Younis, A.A. Ali, W.M. Mohamed, Soft compu-
     ting techniques for biomedical data analysis: open issues and challenges, Artificial Intelli-
     gence Review 56 (2023): 2599–2649.
[11] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts,
     Tools, and Techniques to Build Intelligent Systems, O'Reilly Media, Sebastopol, CA, 2022.
[12] P. Setoodeh, S. Habibi, S. Haykin, Nonlinear Filters: Theory and Applications, Wiley, New
     York, NY, 2022.
[13] M. Anthony, J. Ratsaby, Large-width machine learning algorithm, Progress in Artificial In-
     telligence 9.3 (2020): 275–285.
[14] V. Kotsovsky, A. Batyuk, Feed-forward neural network classifiers with bithreshold-like
     activations, in: Proceedings of IEEE 17th International Scientific and Technical Conference on
     Computer Sciences and Information Technologies, CSIT 2022, Lviv, Ukraine, 2022, pp. 9–12.
[15] A. Blum, R. Rivest, Training a 3-node neural network is NP-complete, Neural Networks 5.1
     (1992): 117–127.
[16] S. Rajput, K. Sreenivasan, D. Papailiopoulos, A. Karbasi, An exponential improvement on the
     memorization capacity of deep threshold networks, in: Advances in Neural Information
     Processing Systems, volume 16, 2021, pp. 12674–12685.
[17] Z.-G. Zhang, Y.-L. Xiao, J. Zhong, Unitary learning in conditional models for deep optics neural
     networks, in: Proceedings of SPIE – The International Society for Optical Engineering, volu-
     me 12565, 2023, no. 1256543.
[18] N. Jiang, Z. Zhang, X. Ma, J. Wang, Y. Yang, Analysis of nonseparable property of multi-valued
     multi-threshold neuron, in: Proceedings of 2008 IEEE International Joint Conference on
     Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China,
     2008, pp. 413-419, doi: 10.1109/IJCNN.2008.4633825
[19] D. R. Haring, Multi-threshold threshold elements, IEEE Transactions on Electronic Com-
     puters EC-15.1 (1966): 45–65.
[20] I. Prokíc, Characterization of multiple-valued threshold functions in the Vilenkin-Chresten-
     son basis, Journal of Multiple-Valued Logic and Soft Computing 34.3-4 (2020): 223–238.
[21] R. Takiyama, The separating capacity of a multithreshold threshold element, IEEE Trans-
     actions on Pattern Analysis and Machine Intelligence. PAMI-7.1 (1985): 112–116.
[22] V. Kotsovsky, A. Batyuk, Multithreshold neural units and networks, in: Proceedings of IEEE
     18th International Conference on Computer Sciences and Information Technologies, CSIT
     2023, Lviv, Ukraine, 2023, pp. 1-5, doi: 10.1109/CSIT61576.2023.10324129.
[23] N. Jiang, Y. X. Yang, X. M. Ma, and Z. Z. Zhang, Using three layer neural network to compute
     multi-valued functions, in 2007 Fourth International Symposium on Neural Networks, June
     3-7, 2007, Nanjing, P.R. China, Part III, LNCS 4493, 2007, pp. 1-8.
[24] M. Anthony, Learning multivalued multithreshold functions, CDMA Research Report No. LSE-
     CDMA-2003-03, London School of Economics, 2003.
[25] V.K. Venkatesan, I. Izonin, J. Periyasamy, A. Indirajithu, A. Batyuk, M.T. Ramakrishna, Incor-
     poration of energy efficient computational strategies for clustering and routing in hetero-
     geneous networks of smart city, Energies 15.20 (2022): 7524.
[26] S. Olafsson, Y. S. Abu-Mostafa, The capacity of multilevel threshold function, IEEE Trans-
     actions on Pattern Analysis and Machine Intelligence 10.2 (1988): 277–281.
[27] I. Izonin, B. Ilchyshyn, R. Tkachenko, M. Gregus, N. Shakhovska, C. Strauss, Towards data nor-
     malization task for the efficient mining of medical data, in: Proceedings of 12th International
     Conference on Advanced Computer Information Technologies, ACIT 2022, Ruzomberok,
     Slovakia, 2022, pp. 480–484.
[28] V. Kotsovsky, “Hybrid 4-layer bithreshold neural network for multiclass classification,” in
     CEUR Workshop Proceedings, volume 3387, 2023, pp. 212–223.
[29] E. B. Baum, On the capabilities of multilayer perceptrons, Journal of Complexity 4.3 (1988):
     193–215.
[30] V. Kotsovsky, A. Batyuk, V. Voityshyn, On the size of weights for bithreshold neurons and
     networks, in: Proceedings of IEEE 16th International Conference on Computer Sciences and
     Information Technologies, CSIT 2021, Lviv, Ukrain, 2021, volume 1, pp. 13–16.
[31] S. Dasgupta, S. Sabato, Robust learning from discriminative feature feedback, in: Proceedings
     of Machine Learning Research, volume 108, 2020, pp. 973–982.
[32] V. Kotsovsky, A. Batyuk, On-line relaxation versus off-line spectral algorithm in the learning
     of polynomial neural units, in: S. Babichev et al., (Eds.), Communications in Computer and
     Information Science, volume 1158, Springer, Cham, 2020, pp. 3–21.
[33] OpenML: A worldwide machine learning lab, 2024. URL: https://openml.org.
[34] M. Lupei, M. Shlahta, O. Mitsa, Y. Horoshko, H. Tsybko, V. Gorbachuk, Development of an
     interactive map within the implementation of actual state and public directions, in: Procee-
     dings of the 12th International Conference on Advanced Computer Information Technolo-
     gies, ACIT 2022, Ruzomberok, Slovakia, 2022, pp. 384–387.
[35] M. Kelly, R. Longjohn, K. Nottingham, The UCI machine learning repository, 2023. URL:
     http://archive.ics.uci.edu.
[36] S. Pölsterl, Scikit-survival: A library for time-to-event analysis built on top of Scikit-learn,
     Journal of Machine Learning Research 21 (2020): 1–6.

</pre>