<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Information Control Systems &amp; Technologies, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Synthesis of a 3-Layer Neural Network Classifier with the Bithreshold First Hidden Layer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vladyslav Kotsovsky</string-name>
          <email>vladyslav.kotsovsky@uzhnu.edu.ua</email>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>2</volume>
      <fpage>4</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>The paper treats the application of the bithreshold approach in the design of neural network classifiers. A novel hybrid 3-layer neural network model is proposed whose first hidden layer consists of bithreshold neurons, and the second hidden layer employs the softmax activation function. This model is intended to solve multiclass classification tasks. A supervised synthesis algorithm is designed for this neural network architecture. It consists of two stages. During the first stage a given training pattern is separated from representatives of other classes using single-threshold neurons, which are gradually converted into bithreshold neural units. In the second stage, the network design procedure reduces the size of network hidden layers in order to simplify the network and enhance the recognition ability. The performance of the proposed model is compared with that of several popular machine learning classifiers on a real-world dataset. Simulation results on optical recognition of handwritten digits benchmark demonstrate that the developed neural network model is suitable for multiclass classification tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>bithreshold neuron</kwd>
        <kwd>multithreshold neuron</kwd>
        <kwd>neural network</kwd>
        <kwd>classification</kwd>
        <kwd>machine learning1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Neural networks play a leading role in modern machine learning due to their ability to model
complex, non-linear relationships [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. They have become the foundation of many state-of-the-art
systems in image recognition [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], natural language processing [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], game playing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and forecasting
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Their flexible architectures [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ] and capacity to learn from large datasets [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] have made them
essential tools in advancing artificial intelligence [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>
        Activation functions are crucial for introducing non-linearity into neural networks [
        <xref ref-type="bibr" rid="ref1 ref11">1, 11</xref>
        ],
allowing them to solve complex tasks [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. They determine how signals are transformed and
propagated through the network [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The choice of activation function can significantly impact the
performance and convergence of a model [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. Without activation functions, neural networks
would behave like simple linear models [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Multithreshold approach in neural computation arose as one of the first attempt to enhance the
ability of classical activation functions (such as the Heaviside or sign functions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] by using two or
more thresholds [16, 17]. Modern applications concern smoothed continuous modification of
multithreshold activations [18], which can outperform modern activations such as ReLU [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Swish
[21] are capable to increase the recognition ability of a neural network [21] by the proper use of
additional thresholds [23]. Moreover, the application of multithreshold neural networks in pattern
classification may significantly reduce the network complexity [
        <xref ref-type="bibr" rid="ref12 ref3">3, 12</xref>
        ].
      </p>
      <p>It should be noted that all known approaches to the learning or synthesis multithreshold
multilayer neural models have relied on offline or batch learning modes. The objective of the present
research is the design of the model of bithreshold neural network (NN) suitable for the online learning.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>First studies on binary-valued multithreshold neural units were conducted by D. R. Haring [16] and
lead to the development of the so-called multithreshold logic (see [24 26] for further references).</p>
      <p>The initial motivation for multithreshold units stemmed from the belief that multiple thresholds
could considerably 16], a hypothesis later confirmed
by counting arguments in [20] and [17]. However, early research lacked practical training methods,
and only few heuristics were proposed. Learning binary multithreshold models proved difficult due
to NP-hardness, even for bithreshold systems [23, 26].</p>
      <p>Interest in multithreshold approaches re-emerged two decades later, driven by the development
of multi-valued neurons and formal definitions of multithreshold functions [17, 21, 25]. Key
contributions by Z. I. Parberry, A. Ngom, and M. Anthony gave theoretical justification
of online learning algorithms based on incremental correction [21], followed by improved both
online and offline methods using relaxation techniques [23]. Note that these algorithms were
designed for the learning of a single multithreshold neural unit.</p>
      <p>
        Recent works [
        <xref ref-type="bibr" rid="ref11">11, 27</xref>
        ] explored network architectures with multithreshold hidden layers, showing
strong performance in classification tasks thanks to offline synthesis algorithms [22, 27]. Hybrid
models combining multithreshold, bithreshold, WTA, and single-threshold units further boosted
accuracy [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Using two thresholds offers a balance between expressive power and synthesis
simplicity [22], though more recent studies expanded to models with multiple thresholds and
multivalued outputs [23]. Generalized versions of these models were applied to pattern classification [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>The multithreshold paradigm also includes hardware implementations, such as those by T. Gowda
[33] and M. Nikodem [28, 29]. Applications in regression are relatively rare, as discrete-valued
activation are better suited to classification tasks. However, both binary- and continuous-valued
multithreshold-based regressors have recently been proposed. The model of neural network
regressor that uses binary-valued bithreshold units was designed in [30], whereas its
continuousvalued generalization within the gradient-based learning framework was proposed in [18].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Models and methods</title>
      <sec id="sec-3-1">
        <title>3.1. Explanation of the bithreshold approach</title>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.1. Model of a bipolar-valued bithreshold neural unit</title>
        <p>The key feature of the proposed classifier model is its hidden layer consisting of binary-valued
bithreshold neurons. Let us consider a model of such a neuron. It is a binary-valued computational
unit [22] with a weight vector w = ( w1, , wn )  Rn and two thresholds t1, t2 (t1  t2 ) , whose single
bipolar output y is obtained by applying the following activation function</p>
        <p>
          +1, if t1  s  t2 ,
ft1,t2 ( s) = 
−1, otherwise
(1)
to the weighted sum w  x = w1x1 + ... + wn xn . Note that in the case t2 behaves like a
sign function (applied to the w x t1). Therefore, the single-threshold linear neural unit is a
particular case of the bithreshold neural unit. Equation (1) gives the simplest kind of multithreshold
activation function, which is a particular case of the general model of multithreshold activation
function considered in [26]. The short notation BN ( w,t1, t2 ) will be used for such a neural unit. It
should be mentioned that the use of bipolar range of function { 1, +1} instead of more usual binary
outputs is intentional, because it allows us to avoid the need for an additional normalization level in
the networks used in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>Consider geometrical concepts related to the performance of bithreshold neural unit. The pair of
two parallel hyperplanes w1x1 + ... + wn xn = t1 and w1x1 + ... + wn xn = t2 divide n-dimensional space Rn
by three parts. Thus, the activation function (1) allows us to distinguish all points located in the
middle region (i.e., between hyperplanes) from all other points of the space. This induces so-called
bithreshold separable sets in the space Rn. Therefore, a bithreshold neural unit can act as a simple
binary classifier. It is evident that the classification capacity of a single bithreshold neuron is very
limited. It is unable to solve relatively simple classification task related to dichotomies of small finite
set of n-dimensional patterns. Furthermore, the general multiclass classification tasks are more
frequent and usual than binary ones. This implies that binary-valued bithreshold neuron must be
combined within a neural network in order to solve real-world problems.</p>
        <p>Let S = (x1, y1), , (xm , ym ) denote a training sample containing m training pairs (xi, yi), where
xi is an n-dimensional real vector (xi Rn) feature vector, yi is a non-negative integer label
be</p>
        <p>K}, where i = m, and K is the number of classes. In practice, S contains
the part of an available dataset, because its remainder can be reserved for special purpose, e.g., for
the test set.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.1.2. Illustration of the idea behind the performance of a bithreshold classifier</title>
        <p>Let us consider an example how bithreshold neurons can be useful for multiclass classification.
Consider a simple example of ternary classification in two dimensions (K = 3, n = 2). Figure 1 shows
three pairwise linearly non-separable classes of 2-dimensional patterns C1, C2 and C3 as well as 12
representative patterns, which form training sample along with their labels and are
consecutively fed on the learner input (in Figure 1, for brevity, patterns are referred only by their labels
, instead of full notation x1 x12).</p>
        <p>Let us consider how these patterns can be separated using bithreshold neurons. Note that in two
dimensions the notion of a hyperplane coincides with a line. It is evident that patterns 1 and 2 are
member of different classes and are linearly separable. Third pattern belongs to class C1, as does the
first one. We can properly separate these three patterns using a single BN1 as shown in Figure 2.</p>
        <p>The bithreshold unit BN1 is defined by a pair of parallel lines l11 and l12. It is evident that pattern 4
cannot be correctly classified using BN1, because it lies in the same region as pattern 3 (actually, the
pattern 1 lies in the same region with respect to the output of BN1). Thus, an additional line is
necessary to separate it from patterns 1 and 3. A possible solution is illustrated in Figure 3. Note that
a single (dotted) line l2 is used, corresponding to a single-threshold neuron TN2.</p>
        <p>Let us assume that a new point always falls within the positive half-plane defining by a single
separating line. Assume that single lines, like l2, can be later complemented with another parallel line
in order to form a full bithreshold neuron instead of single-threshold one. This strategy provides
benefits in terms of memory capacity in the general n-dimensional case for large n, as it requires a
single additional scalar parameter for another threshold, rather than of n + 1 new parameters in the
case when a new complete hyperplane is employed. Note that this approach enabled us to obtain the
second line l12 with intention to extend a single-threshold neuron corresponding to the line l11 of the
bithreshold neuron BN1.</p>
        <p>Consider pattern 5. It belongs to the same plane region as pattern 4. Thus, it must be separated
by an additional line. This cannot be done by using a line parallel to l2, i.e., by extending TN2 to some
bithreshold unit. Let this line be l3, as shown in Figure 4. This line corresponds to some
singlethreshold unit TN3. Pattern 6 falls in the same region as pattern 5. Thus, no additional separation is
required. Pattern 7 belongs to class C3 but lies in the same region as the previous two patterns.
Therefore, we need to separate pattern 7 from them. A possible solution involves a new line, l4, as
shown in Figure 5.</p>
        <p>Pattern 8 conflicts with pattern 2. They can be separated by drawing the line l42, which is parallel
to l4, as shown in Figure 6. This does not affect the separability of patterns 5 and 6 from patterns 7.
Thus, the pair of parallel lines l41 (l4 in Figure 5) and l42 defines the bithreshold neuron BN4.</p>
        <p>Consider the last three patterns. Notice that when dealing with the separation of many patterns,
it is not always easy to determine whether these patterns are properly separated using bithreshold
neurons. This problem can be solved by analyzing the output of all neurons involved in the
separation process. The outputs of the neurons from Figure 6 are presented in Table 1.</p>
        <p>In the case of proper pattern separation, no two identical output rows correspond to patterns
from different classes. It is true for the first eight patterns. The output row for pattern 9 is unique.
Hence, this pattern is correctly separated. The same is true for pattern 10. Consider pattern 11. Its
row of outputs is identical to the row corresponding to pattern 4. However, it is acceptable because
both patterns belong to the same class C3. Note again time that these patterns fall into different
regions of the plane in Figure 6, but these regions are identical with respect to the outputs of neurons.
This apparent ambiguity arises from the fact that both the first and third region, induced by a single
BN, produce the same output 1.</p>
        <p>Consider the last pattern. It belongs to class C2 and shares identical outputs with pattern 10, which
represents class C1. Thus, an additional line is required to separate these two patterns. This can be
done by extending one of single-threshold neurons, TN2 or TN3. Let us extend TN3 into BN3 as shown
in Figure 7.</p>
        <p>Note that the replacing of TN3 with BN3 results in the change of the output table. The values in
the column BN3 are negated for patterns 2, 4, 5, 6, 7, 8, 9, and 11. The updated outputs are presented
in Table 2.</p>
        <p>C1
C2
C1
C3
C2
C2
C3
C1
C3
C1
C3
C2</p>
        <p>It is evident from Table 2 that the above changes do not cause any new conflicts, whereas patterns
10 and 12 have different rows of outputs.</p>
        <p>All neurons obtained during the separation phase are potential candidates for use in a future
classifier (as we will see later, they are useful in the first hidden layer of the corresponding neural
network). However, the set of neurons obtained during the separation may be redundant. This means
that some subset of neurons may performs the same separation as the full set does. Let us return to
our example. Suppose that BN1 is excluded from the separation process. This results in the removal
of the corresponding column from Table 2. The result is shown in Table 3.</p>
        <p>It is easy to verify that there are no equal rows for patterns belonging to different classes. Thus,
it is possible to remove BN1 without the loss of separability of all patterns presented in the training
sample (the redundancy of this neuron follows from the fact that it separates only patterns 1 and 3
from pattern 2, but BN4 also does it). The other three neurons are significant because their removal
breaks the valid separability.</p>
        <p>The table of outputs can be used in order to produce a classifier, but it requires simplification to
reduce its size.</p>
        <p>Let us remove the duplicate rows of outputs (in the table corresponding to a valid separation,
such rows are possible only for patterns that are members of the same class). There are four pairs of
identical rows highlighted in Table 3: namely, 3, 8 for C1; 4, 11 for C3; 2, 12 and 5, 6 for C2. Thus, it is
possible to safely remove rows 6, 8, 11, and 12 without loss of information. The result is presented in
Table 4, where patterns are grouped by class.</p>
        <p>Note that Table 4 contains all eight 3-dimensional Boolean vectors. Therefore, it can be used to
classify any new 2-dimensional pattern P. To do so, we simply compute the outputs of TN2, BN3 and
BN4, respectively, and assign the pattern to the class whose representative in Table 4 shares the same
output as P. The plane partition performed by this classifier is shown in Figure 8. For the given P,
outputs of neurons are (1, 1, 1). This vector matches the penultimate row of Table 4, which
correspond to the representative of class C3. Therefore, our classifier would assign pattern P to class C3.</p>
        <p>It is clear from Figure 8 that the resulting class boundaries are only a rough approximation.
Consequently, corresponding classifier may perform with low accuracy. This can be explained by the
small size of training sample (only 12 patterns were used for illustration purposes) as well as certain
properties of bithreshold activation function (1). These issues will be discussed later.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2. Hybrid 3-layer neural network classifier with hidden bithreshold layer</title>
      </sec>
      <sec id="sec-3-5">
        <title>3.2.1. Architecture of classifier</title>
        <p>Consider a neural-like multilayer feed-forward model of 3-layer classifier, whose principles of
operation were described in the previous subsection. It is evident that the first hidden layer (i.e., the first
network layer after its input layer, which is not taken into account) must consist of neurons that
provide the desired separation of training patterns. This layer contains bipolar-valued bithreshold
nodes as well as single-threshold ones. Let N be the number of these nodes. Then, the first layer
performs a mapping from Rn to { 1, 1}N.</p>
        <p>The second hidden layer contains M nodes and serves as a bridge between the bithreshold and
the output layers. It is constructed using the output table that is associated with the first hidden layer
and can be considered as an M × N bipolar matrix V consisting of unique rows v1 vM, where M is
a number of rows each of which is an N-dimensional bipolar vector containing outputs of all
firstlayer neurons. It is evident that M  minm, 2N . Let us assume that first M1 rows of matrix B
correspond to class C1, next M2 rows to class C2 MK rows to class CK, where M1 +
+M K = M , M i  1, i = 1, , K. Let Ik denote the set of indices of all rows corresponding to class Ck
as, where k = K. For the sake of brevity, assume that all redundant patterns have already been
removed from the training sample, so that M = m. Assume that ith unit in the second layer
corresponds to the vector bi and can recognize the input pattern xi, with the class label yi. Therefore,
this unit may compare the output vector z = z(x) from the first layer with the vector vi and activate
himself only in the case z = vi and transmit its activation to the next layer. However, this
exactmatch approach cannot be applied in practice. This is caused by two reasons: 1) there is no guaranty
that for each of 2N possible bipolar vectors z there exists a corresponding training
sample; 2) for large datasets the size of the first layer N may be so large that the use of the second
layer consisting of 2N units is not feasible. Thus, the relationship between z and vi should use the
proximity instead of the equality. Therefore, only a unit will be activated for which the distance
between z and vi is minimal. If the distance is defined as Euclidian distance, such behavior can be
implemented using a layer of neural units with an appropriate activation. It follows from the fact
that</p>
        <p>z − vi 2 = z 2 − 2z  vi + vi 2 = 2M − 2vi  z.</p>
        <p>In the last equation, we used that squared Euclidian norm of M-dimensional bipolar vector is equal
to M. Therefore, the minimization of z − vi is equivalent to the maximization of the dot product
vi  z . Thus, it is possible to use a layer with weight matrix V and WTA activation mode for this
purpose, in which the maximum layer output is transformed to 1, and all others set to 0. An
alternative approach consists in the use of the softmax activation mode instead of WTA, which provides a
smoothed continuous version of WTA. This mode preserves the possibility of the membership to
numerous classes for patterns lying near the decision boundaries of the classifiers. Let s(z) = (s1(z),
sM(z)) denote the output of second layer given the input z. Then,
si (z) =
exp( vi  z)</p>
        <p>S (z)</p>
        <p>M
, where S (z ) = exp ( vi  z ), i = 1, , M .</p>
        <p>i=1
(2)</p>
        <p>The (third) output layer of classifier traditionally contains as many nodes as the number of
distinct classes. It uses the linear neural units without biases and activation functions. The last layer
weight matrix U = (uki), (k = K, i = M) is predefined as follows:</p>
        <p>1, if i  Ik ,
uki = </p>
        <p>0, otherwise.</p>
        <p>Thus, from (2) and (3) we may conclude that</p>
        <p>M K
yk (x) = uki si (z(x)) =  si (z(x)), (k = 1, , K ),  yk = 1.</p>
        <p>i=1 iIk k=1
It follows from the last equation that kth output node indicates the predicted probability of the input
x to be a member of class Ck for k = K.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.2.2. Network synthesis algorithm</title>
        <p>Consider how to synthesize the neural network classifier described in 3.2.1. The key question is the
construction of the network first layer providing the desired separation of training patterns.</p>
        <p>The synthesis algorithm consists of two stages. The first stage is most important and consists in
the separation of the representatives of every class. During this stage, the current input pattern xi
(i = 2 m) must be separated from the patterns that belong to other classes and have already been
processed during the synthesis procedure. The conceptual scheme of this process is illustrated in
Figure 9.</p>
        <p>Compute the output
vector z for x</p>
        <p>Are there conflicts?</p>
        <p>Yes</p>
        <p>No
End</p>
        <p>Recalculate
output table
Try complete
some LN to BN</p>
        <p>Yes</p>
        <p>No
Was any trial
successful?
Add a new LN
in the 2nd layer
(3)
(4)</p>
        <p>The process begins by computing the output vector zi, which consists of the outputs of all neurons
that have already been included in the first layer. If no conflict occurres (i.e., zi zj for all j such that
1 j &lt; i and yj yi), then no changes are required. Otherwise, an additional action are necessary to
resolve conflicts for some patterns xi and xj such that zi = zj and yj yi. First, the synthesis system
attempts to resolve conflict between ith and jth patterns without inserting new nodes in the first
layer. It checks whether there exists a single-threshold neural unit LN(w, t) that can be extend to a
BN ( w,t1, t2 ) with some new threshold t1 and t2, in such way that ft1, t2 (w  xi )  ft1, t2 (w  x j ) and the
replacement of LN(w, t) with BN ( w,t1, t2 ) does not cause any conflict among already separated
patterns x1, , xi−1. If it is impossible, then the algorithm proceeds by adding a new single-threshold
neuron LN(w, t) in the layer, where.</p>
        <p>w = 2(xi − x j ), t = xi 2 − x j
2
Such choice ensures consistent separation, because w  xi  t  w  x j . The last inequalities follow
from the fact that</p>
        <p>1
 0, t = (w  xi + w  x j ).</p>
        <p>2
Note that if the input patterns have integer coordinates, then both w and t in (4) are also integer,
which may be advantageous for certain implementation purposes [31].</p>
        <p>The second stage of the synthesis algorithm aims to simplify the first layer of the network by
eliminating redundant neurons and duplicate rows in the output matrix V. This stage results in an
N × n real matrix W weight matrix of the first network layer, as well as in M × N bipolar matrix V
weight matrix of the second network layer. The binary K × M weight matrix U of the output layer is
defined by (3).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>The performance of proposed 3-layer neural network classifier was compared with the performance
of several popular classifiers in order to estimate its ability to solve the classification task on the
opdigits dataset, which is considered as small-sized benchmark in pattern classification [32]. This
dataset contains 1797 8 × 8 images of one of 10 handwritten digits, with no missing values. This is a
copy of the test set of the UCI ML hand-written digits dataset [33] with reduced dimensionality and
some invariance to small distortions. All 64 input attributes are integers in the range 0 16. A more
detailed description of the model is available in [32].</p>
      <p>
        During the simulation the performance of following 4 popular and 3 bithreshold-like models was
compared. Popular classifiers were: decision tree, random forest (averaging algorithms based on
randomized decision trees) [32], LinearSVC (support vector machine with linear kernel) [32] and
MLPClassifier (multilayer perceptron classifier) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>The Scikit Learn library [32] implementations of popular classifiers were used with recommended
parameter settings. The 2-layer bithreshold NN classifier [22] was studied as well as the 3-layer NN
classifier described in the third section. The 2-layer model was tested with different values of its
hyperparameter . Two versions of 3-layer classifier were studied. The first of them used only
singlethreshold neurons in first hidden layer. The second version used both single-threshold and
bithreshold neurons in this layer.</p>
      <p>
        Main classification metrics were used: accuracy, precision, recall, and F1 score [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Last three
metrics were calculated for each label, and their unweighted means were found. This strategy is quite
reasonable, as there was no significant imbalance between class sizes in the training sample [33].
5fold cross-validation [
        <xref ref-type="bibr" rid="ref1">1, 32</xref>
        ] was applied in order to obtains representative results. The simulation
results will be presented and discussed in the following section.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and discussion</title>
      <p>Simulation results are shown in Table 5.
By analyzing simulation results, it is possible to conclude that:
1. All but one model demonstrated higher level of precision compared to other metrics, but, in
general, the difference between different metric scores was not significant.
2. Neural-based models outperformed other classifiers. MLP classifier achieved the best results
by all four main metrics.
3. 3-layer bithreshold NN was third best by all metrics.
4. The results of the random search techniques showed that 1 is preferable. Larger values
of did not provide the improvement of the performance.
5. Both 3-layer networks overperformed 2-layer one.
6. The use of the bithreshold neurons in the second network layer resulted in the significant
improvement of performance. Moreover, 3-layer bithreshold NN has in average 19% fewer
neurons in the first layer as well as the size of its second layer was approximately 41% smaller
compared 3-layer single-threshold NN.</p>
      <p>The 3-layer NN classifier design employs the synthesis approach. As a result, the sizes of the
hidden layers depend on the specific dataset used, and even on the order in which the patterns are
selected during synthesis. Experiment results show that the second network layer is quite acceptable
(e.g., few dozens of units for digits dataset, typically in the range 21..40).</p>
      <p>It seems that the main drawback of the proposed model of classifier is the large size of its second
layer, which was enormous for single-threshold version (over 1,000) and also excessively large in the
case of bithreshold modification. This is due to the fact that the number of duplicate rows was not
very large (between 109 and 816). Therefore, second layer may remain very large.</p>
      <p>Note that second stage (network simplification) does not significantly impact the performance of
classifier during the experiment (it caused decrease in accuracy by 1% 12%). The reasons for such a
performance degradation are unclear and require further investigation.</p>
      <p>The comparison of 2-layer bithreshold NN model proposed in [22] and the current 3-layer model
is also noteworthy. The results of the above experiment showed that 3-layer model had better
prediction accuracy on new data. Nevertheless, the 2-layer model is much more compact and can be
used for larger datasets compared to 3-layer one. Unlike the 2-layer model, the computational
complexity of the proposed synthesis algorithm has not yet been analyzed and remains an open
question.</p>
      <p>The experiment also showed that the second stage of the synthesis algorithm can be significantly
more expensive than the first separation stage. This is due to the need to search for identical rows in
the output table, which can be large enough. The simplest implementation uses two nested loops and
can be very slow. This limitation can be partially bypassed by applying the hashing to the set matrix
rows.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>The applications of the neural systems based on the bithreshold approach in neural computation
have been considered in the paper. The model of the hybrid 3-layer neural network has been designed
whose first hidden layer consists of both single-threshold and bithreshold neural units. The second
hidden layer of the network contains neural units, which serve as memory cells, and uses the softmax
activation principle. The last layer consists of K neurons with predefined weights, where K is the
number of classes of the particular classification task for which the neural network is designed.</p>
      <p>The proposed classifier employs a model-based approach to synthesis, using the first layer as a
compressed, encoded representation of the training sample. The synthesis algorithm can be
considered as an online algorithm because during the one step of the synthesis process only the
current pattern is analyzed. The simulation results obtained on optical recognition of handwritten
digits dataset demonstrated that proposed NN model is concurrent compared to popular machine
learning classifier and outperforms the 2-layer bithreshold NN synthesized using the offline
algorithm from [22].</p>
      <p>As it was mentioned in the discussion section, the proposed model of classifier is not flawless.
Further studies are necessary to improve the structure of the second hidden layer, as well as reduce
the impact of the order of training pairs on the size of the network and its performance.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
Information Technologies, CSIT 2020, vol. 1. Lviv, Ukraine, 2020, pp. 32 35. doi: 10.1109/
CSIT49958.2020.9321883
[16] D. R. Haring, Multi-threshold building blocks, IEEE Transactions on Electronic Computers
EC15.4 (1966): 662 663.
[17] A.</p>
      <p>capacity of multiple-valued multiple-threshold perceptrons, IEEE Transactions on Neural
Networks 14.3 (2003): 469 477.
[18] V. Kotsovsky, Multithreshold neurons with smoothed activation functions, in: CEUR Workshop</p>
      <p>Proceedings, volume 3983, 2025, pp. 93 102.
[19] T. tion functions for deep neural
networks, Studies in Computational Intelligence, volume 903, 2021, pp. 203-224.
[20] S. Olafsson, Y. S. Abu-Mostafa, The capacity of multilevel threshold function, IEEE Transactions
on Pattern Analysis and Machine Intelligence 10.2 (1988): 277 281.
[21] Z. Obradovic, I. Parberry, Learning with discrete multivalued neurons, Journal of Computer and</p>
      <p>System Sciences 49 (1994): 375 390.
[22] V. Kotsovsky, A. Batyuk, Representational capabilities and learning of bithreshold neural
networks, in: S. Babichev et al. (Eds), Advances in Intelligent Systems and Computing, volume 1246,
Springer, Cham, 2021, pp. 499 514.
[23] V. Kotsovsky, Learning of multi-valued multithreshold neural units, in: CEUR Workshop
Proceedings, volume 3688, 2024, pp. 39 49.
[24] N. Jiang, Y. X. Yang, X. M. Ma, and Z. Z. Zhang, Using three layer neural network to compute
multi-valued functions, in 2007 Fourth International Symposium on Neural Networks, June 3-7,
2007, Nanjing, P.R. China, Part III, LNCS 4493, 2007, pp. 1-8.
[25] M. Anthony, J. Ratsaby, Large-width machine learning algorithm, Progress in Artificial
Intelligence 9.3 (2020): 275 285.
[26] V. Kotsovsky, A. Batyuk, Multithreshold neural units and networks, in: Proceedings of IEEE
18th International Conference on Computer Sciences and Information Technologies, CSIT 2023,
Lviv, Ukraine, 2023, pp. 1-5.
[27] V. Kotsovsky, Synthesis of multithreshold neural network classifier, in: CEUR Workshop
Proceedings, volume 3711, 2024, pp. 75 88.
[28] T. Gowda et al., Identification of threshold functions and synthesis of threshold networks, IEEE</p>
      <p>Transaction on Computer-Aided Design, 30.5 (2011): 665-677.
[29] M. Nikodem, Synthesis of multithreshold threshold gates based on negative differential
resistance devices, IET Circuits Devices Syst. 7.5 (2013): 232 242.
[30] V. Kotsovsky, A. Batyuk, Towards the design of bithreshold ANN regressor, in: 19th IEEE
International Scientific and Technical Conference on Computer Sciences and Information
Technologies, CSIT 2024. Lviv, October 16 19, pp. 1 4, 2024.
[31] F. Geche et al., Synthesis of the integer neural elements, in: Proceedings of the International
Conference on Computer Sciences and Information Technologies, CSIT 2015, Lviv, Ukraine,
2015, pp. 121 136.
[32] F. Pedregosa et al., Scikit-learn: machine learning in Python, Journal of Machine Learning
Research 12 (2011): 2825-2830.
[33] M. Kelly, R. Longjohn, K. Nottingham, The UCI machine learning repository, 2023. URL:
http://archive.ics.uci.edu.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Géron</surname>
          </string-name>
          ,
          <article-title>Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems</article-title>
          , 3rd ed.,
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Sebastopol, CA,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.H.</given-names>
            <surname>Houssein</surname>
          </string-name>
          et al.,
          <article-title>Soft computing techniques for biomedical data analysis: open issues and challenges</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>56</volume>
          (
          <year>2023</year>
          ):
          <fpage>2599</fpage>
          <lpage>2649</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10462-023-10585-2
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>I. Izonin</surname>
          </string-name>
          et al.,
          <article-title>Cascade-based input-doubling classifier for predicting survival in allogeneic bone marrow transplants: small data case</article-title>
          ,
          <source>Computation 13.4</source>
          (
          <year>2025</year>
          ):
          <fpage>80</fpage>
          . doi:
          <volume>10</volume>
          .3390/computation1 3040080
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Mitsa</surname>
          </string-name>
          et al.,
          <article-title>A comparative study of machine learning algorithms and the prompting approach using GPT-3.5 Turbo for text categorization</article-title>
          , in: Z.
          <string-name>
            <surname>Hu</surname>
          </string-name>
          et al. (Eds.),
          <source>Lecture Notes on Data Engineering and Communications Technologies</source>
          , volume
          <volume>242</volume>
          , Springer, Cham,
          <year>2025</year>
          , pp.
          <fpage>156</fpage>
          <lpage>167</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -84228-3_
          <fpage>13</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Andrashko</surname>
          </string-name>
          et al.,
          <article-title>A method for assessing the productivity trends of collective scientific subjects based on the modified PageRank algorithm</article-title>
          ,
          <source>Eastern-European Journal of Enterprise Technologies 1.4</source>
          (
          <year>2023</year>
          ):
          <fpage>41</fpage>
          <lpage>47</lpage>
          . doi:
          <volume>10</volume>
          .15587/
          <fpage>1729</fpage>
          -
          <lpage>4061</lpage>
          .
          <year>2023</year>
          .273929
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Geche</surname>
          </string-name>
          et al.,
          <article-title>Synthesis of time series forecasting scheme based on forecasting models system</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>1356</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>121</fpage>
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <article-title>Past, present and future of computational intelligence: a bibliometric analysis</article-title>
          ,
          <source>in: AIP Conference Proceedings</source>
          <volume>2916</volume>
          (
          <issue>1</issue>
          ),
          <year>2023</year>
          ,
          <volume>020001</volume>
          . doi:
          <volume>10</volume>
          .1063/5.0177490
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Venkatesan</surname>
          </string-name>
          et al.,
          <article-title>High-Performance artificial intelligence recommendation of quality research papers using effective collaborative approach</article-title>
          ,
          <source>Systems 11.2</source>
          (
          <year>2023</year>
          ):
          <fpage>81</fpage>
          . doi:
          <volume>10</volume>
          .3390/systems11020081.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Moon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Mounting angle prediction for automotive radar using complex-valued convolutional neural network</article-title>
          ,
          <source>Sensors</source>
          <volume>25</volume>
          .2 (
          <year>2025</year>
          ):
          <fpage>353</fpage>
          . doi:
          <volume>10</volume>
          .3390/s25020353
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Mitsa</surname>
          </string-name>
          et al.,
          <article-title>Ethnocultural, educational and scientific potential of the interactive dialects map</article-title>
          ,
          <source>in: IEEE International Conference on Smart Information Systems and Technologies</source>
          ,
          <string-name>
            <surname>SIST</surname>
          </string-name>
          <year>2023</year>
          . Astana, May 4-
          <issue>6</issue>
          , pp.
          <fpage>226</fpage>
          <lpage>231</lpage>
          . doi:
          <volume>10</volume>
          .1109/SIST58284.
          <year>2023</year>
          .10223544
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <article-title>Hybrid 4-layer bithreshold neural network for multiclass classification</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3387</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>212</fpage>
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>I. Izonin</surname>
          </string-name>
          et al.,
          <article-title>A hybrid two-ML-based classifier to predict the survival of kidney transplants one month after transplantation</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3609</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>322</fpage>
          <lpage>331</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <article-title>Activation functions in deep learning: a comprehensive survey and benchmark</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>503</volume>
          (
          <year>2022</year>
          ):
          <fpage>92</fpage>
          <lpage>108</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.neucom.
          <year>2022</year>
          .
          <volume>06</volume>
          .111
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Apicella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Donnarumma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Isgrò</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Prevete</surname>
          </string-name>
          ,
          <article-title>A survey on modern trainable activation functions</article-title>
          ,
          <source>Neural Networks</source>
          <volume>138</volume>
          (
          <year>2021</year>
          ):
          <fpage>14</fpage>
          <lpage>32</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.neunet.
          <year>2021</year>
          .
          <volume>01</volume>
          .026
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Geche</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Batyuk</surname>
          </string-name>
          ,
          <article-title>Bithreshold neural network classifier</article-title>
          ,
          <source>in: Proceedings of the IEEE 15th International Scientific and Technical Conference on Computer Sciences and</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>