<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Assigning different activation functions in artificial neural networks with the goal of achieving higher prediction accuracy*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gytis Baravykas</string-name>
          <email>gytis.baravykas@ktu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Justas Kardoka</string-name>
          <email>justas.kardoka@ktu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domas Grigaliunas</string-name>
          <email>domas.grigaliunas@ktu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Darius Naujokaitis</string-name>
          <email>darius.naujokaitis@ktu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Informatics, Kaunas University of Technology</institution>
          ,
          <addr-line>Studentu 50, 51368 Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IVUS2024: Information Society and University Studies 2024</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Smart Grids and Renewable Energy Laboratory, Lithuanian Energy Institute</institution>
          ,
          <addr-line>44403 Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The research paper explores the concept of using multiple activation functions in artificial neural networks and investigates their impact on model performance. The experiments conducted on various models such as AlexNet, ResNet50, TuNet, and SimpleNN reveal insights into the effectiveness of different activation function combinations. The results indicate that using multiple activation functions can lead to modest improvements in model performance, particularly in image segmentation tasks where modifications to the UNet architecture show significant enhancements. However, for time series regression/forecasting tasks, the experiments demonstrate that using multiple activation functions does not significantly improve prediction accuracy. Therefore, the paper concludes that while there are some benefits to using multiple activation functions in certain scenarios, the choice of activation function should be based on the specific task and dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Activation functions</kwd>
        <kwd>artificialneural networks</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review</title>
      <p>Activation functions in an ANN are used to introduce non-linear relations to the data, so that the
network would better fit the results and improve the accuracy of a given task. It is a very common
part of ANNs and often omitted from neural network structure diagrams. Many mathematical
functions have been introduced to achieve non-linearity, such as ReLU, Tanh, Sigmoid and others,
each tailored to specific tasks. In this paper we entertain the idea of using no one activation function
per layer or network, but multiple, assigning a different one for each neuron.</p>
      <p>
        The importance of activation functions is discussed in many recent works. Their importance is
based on their wide-spread usage in ANN architectures. Dubey has published a comprehensive
overview of the most common activation functions, along with their characteristics and a
performance comparison between them [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. They have found that different activations functions
are more suited for certain machine learning tasks, and that in certain cases, alternative choices
must be considered. Although there are some common choices, new activation functions are
constantly being developed [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6">2,3,4,5,6</xref>
        ]. Yu has created a modified activation function based on
ReLU, with the goal of increasing the accuracy of classification tasks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Wang developed a
activation function as a better alternative to other commonly used activation functions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
developed activation function, Smish, performed better than other common activation functions in
classification tasks on open datasets. Wuraola has developed a family of activation functions that
are to be used in embedded systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The proposed activation functions were shown to be
computationally faster, and their use resulted in higher accuracy results than other common
activation functions in recurrent neural networks and logistic regression models. Kaytan has
introduced a new non-monotonic activation function capable of achieving higher results than
other activation functions like Swish, Mish and others for image classification tasks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Chai
developed a new model based on LSTM capable of achieving higher accuracy for short-term PV
generation forecasts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The model uses a newly proposed activation function that helps solve the
gradient disappearance problem and ensures a high accuracy of the prediction results for the task of
short-term PV generation. There are also works in which the activation functions of the default
implementation of model architectures are switched with other, alternative activation functions.
Anami had performed experiments in which they had tried to compare prediction results by
switching the default activation function with other different, common activation functions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Wang has performed experiments in which they tried to use alternative activation functions in
VGG16, ResNet50 and LeNet architectures, achieving superior results [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Essai Ali has tried to
modify a LSTM by changing its’ Tanh functions to different activation functions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The author
has achieved his aim of increasing the classificationaccuracy from 86% to 88% using the Weather
Reports dataset, and from 93% to 97 % using the Japanese Vowels dataset.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
    </sec>
    <sec id="sec-4">
      <title>3.1. Activation functions</title>
      <p>ℎ = ∑  ⋅  +</p>
      <p>=1
1 = (ℎ1)
2 = ℎ(ℎ2)
1 = 1 + 2ℎ
(1)
(2)
(3)
(4)
where ℎ – hidden layers,  – weights,  – inputs,  – bias,  – activation function results and  –
outputs. In an artificial convolutional neural network activations play a similar role, but because
there are no actual neurons in a convolutional layer, different application is required. For the
convolution layer 2 approaches were introduced.</p>
      <p>In regular CNN architectures there is often only one activation function in a convolution layer.
As displayed in the diagram Figure 2.1. different activation function can be applied to each channel
after the convolution layer. Second diagram Figure 2.2. refers to another idea to apply multiple
activation functions for each matrix column. In this case 3x3 matrix there are 3 columns in each
channel. Every slice has a specificactivation applied to it.</p>
      <p>Some CNN architectures have a linear neuron layer which typically have only on activation
function. The idea displayed on Figure 3 is to leave one activation in convolution layers and only
have multiple activation functions in linear neuron layers, specifically an activation function for
each neuron. As displayer in the diagram boxes (1-4) can each have a specific function assigned
creating a spectrum of variations: (1-tanh, 2-relu, 3-sigmoid, 4-softmax), (1-relu, 2-tanh, 3-sigmoid,
4-relu) and so on.</p>
      <p>For linear layers it is also possible to have a complete list of activation functions assigned. This
idea is later experimented in this paper. Combinations of this list can be calculated as such. In this
case 2 activation functions (ReLU, Tanh) power by 4 neurons equal to 16 variations:
 = 
(5)
where  – variations,  – elected activations and  – number of neurons.</p>
      <p>It must also be noted that various activation functions can be used, and it is not limited to the
most used activation functions such as ReLU, Tanh, Sigmoid, etc. The range of activation functions
that were tested in this work are detailed in the experiments section.</p>
    </sec>
    <sec id="sec-5">
      <title>3.2. Models</title>
      <p>
        There has been a vast selection of CNN models proposed for image classification,a lot of those
have complex implementations and long training hours. The models chosen for this paper are a low
to mid- range complexity to test out the theory. Starting with SimpleNN, a simple neural network
with one hidden layer of N neurons. TuNet – a CNN with 2 convolutions, 2 polling layers and 3
linear layers [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. AlexNet is a convolutional neural network (CNN) architecture that consists of
five convolutional layers, three fully connected layers, and two pooling layers [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
convolutional layers extract features from the input images, while the pooling layers reduce the
dimensionality of the feature maps. The fully connected layers learn a mapping from the extracted
features to the output classes. Some of the key innovations introduced by AlexNet include the use of
rectified linear unit (ReLU) activation functions, dropout regularization, and data augmentation
techniques.
      </p>
      <p>ResNet50 derives its name from its depth, incorporating 50 layers [11]. Notably, ResNet50
addresses the challenge of training deep networks by introducing residual connections that enable
the direct flow of information across layers. This innovation mitigates the vanishing gradient
problem, allowing for the successful training of extremely deep networks.</p>
      <p>The architecture comprises building blocks known as residual blocks, each containing skip
connections that bypass one or more layers. These skip connections facilitate the smooth
propagation of gradients during backpropagation, enhancing the model's ability to capture intricate
features. Additionally, ResNet50 employs batch normalization to accelerate training convergence
and improve generalization performance.</p>
      <p>UNet was used for image segmentation tasks [12]. It is a popular model with several
modificationsover the years [13,14,15]. The model has improved on the results of previous image
segmentation models by its’ architecture consisting of a contracting path used for capturing context
and a symmetric expanding path used that enables precise localization [12]. The resulting
architecture consists of 23 convolutional layers and the architecture utilizes the ReLU activation
function. The model also heavily utilizes image augmentation, which enables it to achieve high
accuracy without relying on many training images.</p>
    </sec>
    <sec id="sec-6">
      <title>3.3. Datasets 3.3.1. Images</title>
      <p>Several image datasets are popular for testing performance of CNN models. The CIFAR-100 is a
dataset containing 60 000 32x32 color images with 100 classes (600 images per class). It is a subset of
the Tiny Images dataset and is commonly used for fine-grained image classification [16]. The
dataset contains a wide variety of images of objects, animals, and textures. The images are labeled
with both fine-grained and coarse labels. The fine-grained labels correspond to the specific object or
scene in the image, while the coarse labels correspond to the superclass of the object or scene.</p>
      <p>The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at
the International Joint Conference on Neural Networks (IJCNN) 2011 [17]. The following dataset
includes 43 classes of traffic signs and more than 50,000 images.</p>
      <p>Cityscapes dataset is a popular image segmentation dataset that consists of 25 000 such images
captured from a moving vehicle [13,14,15]. The images were taken in different cities in Germany
during different weather conditions. The dataset consists of 50 different classes. Each dataset item
consists of a horizontally joined image, in which the left image is the original photograph,
meanwhile the right image is the semantically segmented version of the image.
3.3.2. Tabular</p>
      <p>Two tabular datasets were incorporated in this paper: breast cancer and iris flower classification.
Breast cancer dataset features are computed from a digitized image of a fineneedle aspirate (FNA)
of a breast mass [18]. They describe characteristics of the cell nuclei present in the image. A few of
the images can be found at http://www.cs.wisc.edu/~street/images/.</p>
      <p>Iris flowers dataset is one of the earliest datasets used in literature on classification methods and
widely used in statistics and machine learning [19]. The data set contains 3 classes of 50 instances
each, where each class refers to a type of iris plant. One class is linearly separable from the other 2;
the latter are not linearly separable from each other. When performing experiments, Obaid’s work
was used as a benchmark for the comparison of results [20].</p>
    </sec>
    <sec id="sec-7">
      <title>3.3.3. Timeseries</title>
      <p>Timeseries data for amazon stocks with stock price, closing price and other attributes was used
[21]. Additionally, a custom photovoltaic (PV) panel generation dataset was used. The data consists
of about a year of meteorological and PV generation data. The PV generation data was retrieved
from a PV station in Kaunas, Lithuania, meanwhile the publicly available meteorological data was
retrieved from Oikolab and from the Lithuanian Hydrometeorological Service. It was also attempted
to include METAR data on cloud conditions at different altitudes, but utilizing this data did not
provide any improvement to the results, so it was left out from the dataset. Based on the observed
linear relationships between different meteorological features and PV generation, certain
meteorological features were chosen to be used in the experiments (see Figure 4).</p>
      <p>As can be seen from the relationships between different features, a strong linear relationship
between PV generation and air temperature, surface solar radiation has been observed. It was noted
that using other meteorological data improved the results, although these features did not seem to
have a linear relationship with the PV generation data. In total, the dataset consists of the following
11 features (see Table 1).</p>
    </sec>
    <sec id="sec-8">
      <title>3.4. Environment</title>
      <p>Google Collab environment with a single NVIDIA Tesla T4 GPU was used for experimentations of
AlexNet and ResNet50 on CIFAR100. For GTSRB, UNet and LSTM experiments, the models were
trained on two Tesla T4 GPU setup. Amazon stock close predictions were performed on a Kaggle
provided CPU.</p>
    </sec>
    <sec id="sec-9">
      <title>4. Experiments and results</title>
    </sec>
    <sec id="sec-10">
      <title>4.1. Image classification</title>
    </sec>
    <sec id="sec-11">
      <title>4.1.1. CIFAR-100 with AlexNet</title>
      <p>Inspired by Sharma’s work [22], we choose AlexNet as the primary target. Main reasons for
choosing this architecture were that it had linear layers aside convolution blocks. We began
experiments with the OriginalAlexNet implementation as a baseline with Tanh. Next, we
experimented with changing only linear layers - changing one layer then changing both. The
change was that instead of applying a single activation function, we applied 2 or 3 in cyclic order.
The best results were with Tanh and Softmax combination of functions – 1.14% improvement in
testing accuracy compared to the ReLU baseline, however, Tanh baseline was still more superior.</p>
      <p>Later, we expanded experimentation with modifying Convolution Neural Network layers (CNN).
Here implementation consisted of changing activation functions per channel. This showed
marginally better results than the OriginalAlexNet with ReLU - 0.36% improvement.</p>
      <p>For experimentation, hyper parameters were the following: learning rate – 0.0001, batch size –
256 and number of epochs – 40.
4.1.2. CIFAR-100 with ResNet50
ResNet50</p>
      <p>ResNet50Cus
tomResiduala</p>
      <p>ResNet50Cus
tomResidualb</p>
      <p>ResNet50Cus
tomResidualc</p>
      <p>ResNet50Cus
tomResidualr
Tanh
ReLU, Tanh</p>
      <p>ReLU,
SoftMax</p>
      <p>Tanh,
Softmax, ReLU
random list</p>
    </sec>
    <sec id="sec-12">
      <title>4.1.3 GTSRB with TuNet</title>
      <p>Classifying images are pre-processed in the same manner and on the same training parameters
as in the previous experiments, meanwhile the fixed size image is 32 by 32 pixels. The training
parameters for TuNet are as follows: optimizer – Adam, learning rate – 0.001, loss function – cross
entropy and batch size – 32. As can be seen in Table 4, the results of the TuNet baseline are
generally worse than of the modifiedarchitecture:</p>
    </sec>
    <sec id="sec-13">
      <title>4.2. Cityscapes with UNet</title>
      <p>For the image segmentation task, the popular Cityscapes dataset was chosen alongside the UNet
model. The following parameters were the same for all the experiments using UNet: Adam
optimizer with a learning rate of 0.001, the mean-squared error as the loss function, a batch size of 4
and 20 as the number of epochs for training.</p>
      <p>As it can be seen from the results of the experiments, a significant Dice metric increase of about
10% was achieved by various activation function combinations (see Table 5).</p>
      <p>As can be seen from the table, using almost any combinations of activation functions can result
in better prediction results in the case of UNet. It is also observed that even changing the activation
in the baseline model from ReLU to Tanh has improved the results by a significant amountas well.</p>
    </sec>
    <sec id="sec-14">
      <title>4.3 Time series regression/forecasting</title>
    </sec>
    <sec id="sec-15">
      <title>4.3.1 Simple NN on Amazon stock prediction</title>
      <p>Experiments were performed on Amazon stock timeseries data predict the closing price for the
next day. An architecture named SimpleNN was used. It is a neural network with 1 input cells, 14
hidden layer cells and 1 output. The following parameters were used in the experiment: optimizer –
Adam, learning rate – 0.001, loss function – mean-squared error, batch size – 16, lag values – 7 and
number of training epochs – 5.</p>
      <p>The experiment compares the same model and its architecture, the only difference is
activations per neuron and one activation for the whole network (see Table 6).</p>
      <p>ReLU, Softmax</p>
      <p>ReLU, ReLU, ReLU, ReLU, ReLU,
Sigmoid, ReLU, ReLU, Sigmoid, ReLU,
ReLU, Sigmoid, ReLU, Sigmoid</p>
      <p>Additionally, all possible combinations of different activation functions sets have been tested (see
model PerNeuronList).</p>
      <p>As can be seen from the results, there is an increase in accuracy in certain cases, and it can also
be observed that finding the best possible set of activation functions yielded the best results out of
the experiments.</p>
    </sec>
    <sec id="sec-16">
      <title>4.3.2 Custom PV dataset with LSTM</title>
      <p>Experiments were performed using a time-series dataset for forecasting PV generation. An LSTM
model was used, as it is often utilized for solving PV generation forecast tasks [23,24,25,26,27]. For
performing the forecasts, the output of the previous step is used as the input of the following
training step. The following parameters were used for the experiments: Adam optimizer with a
learning rate of 0.001, mean-squared error for the error metric, a batch size of 8, 12 lag values for the
PV data, and 20 training epochs.</p>
      <p>The parameters for the experiments were chosen based on experiments performed using
different sets of parameters. The batch size refers to the number of predictions retrieved from the
model output and the lag values refers to the number of previous predictions to use as input of the
next prediction. Based on tests using different lag values, a value of 12 was noticed to be one of the
best values for this parameter, although this parameter did not seem to have much impact on the
accuracy of predictions. Regarding transformations of data, the training data has been standardized
so that the ranges of values would be the same for all features.</p>
      <p>As can be seen from Table 7, there is no significant improvement based on testing RMSLE.
Although many experiments yielded similar results to the baseline, there was not a single
experiment which yielded better results than the baseline. It can also be observed that an increase in
the number of different activation functions used does not improve the forecast results either.</p>
    </sec>
    <sec id="sec-17">
      <title>4.4 Tabular</title>
      <p>Tabular data is still widely used in machine learning tasks. In this paper we choose two datasets
to experiment with the changes on Iris flowers and Breast cancer classifications. Both experiments
have the following training parameters: optimizer – SGD, learning rate – 0.01, loss function – cross
entropy loss and number of training epochs – 200.</p>
      <p>From results displayed in Table 8 comparing one activation versus multiple for this Iris flowers
classificationtask, there is no improvement compared to best suited activation function.</p>
      <p>Experiments performed on breast cancer dataset can be visible in Table 9. After training testing
results, can be viewed in the table below. As we can see there is slight improvement with model
having multiple activation functions.</p>
      <p>Additionally, a activation function set from a large number of combinations was selected and the
accuracy using it is better compared to one activation function (see Table 10).</p>
      <p>It should also be noted that better results were achieved than from the SVM described in Obaid’s
work. As can be seen from the results, there is a significant accuracy increase for the PerNeuron
models, whilst the most significant increase can be seen when finding the best activation function
list from all possible combinations.</p>
    </sec>
    <sec id="sec-18">
      <title>5. Conclusions and discussion</title>
      <p>The research paper explores the concept of using multiple activation functions in artificial neural
networks. It discusses the role of activation functions in introducing non-linear relations to improve
the accuracy of tasks. The paper investigates different approaches to incorporating multiple
activation functions, including assigning a different function to each neuron or channel.</p>
      <p>The experiments included using models such as AlexNet, ResNet50, TuNet, and SimpleNN. In the
AlexNet experiment, different activation function combinations were tested in both linear layers
and convolutional neural network (CNN) layers. The results showed that using OriginalAlexNet
with Tanh activation function yielded the best overall performance. The ResNet50 experiments
resulted in one combination performing marginally better than any of single function baselines. The
TuNet and SimpleNN experiments aimed to evaluate the performance of these specific architectures
on their respective datasets. Overall, the experiments provided insights into the impact of activation
function combinations on model performance, with modest improvements observed compared to
using a single activation function. The datasets used in the experiments included CIFAR-100,
GTSRB, Breast Cancer Wisconsin (Diagnostic), Iris flowers, and Amazon stocks. In image
segmentation tasks, modifying the UNet architecture with different activation function
combinations leads to significant improvements in the Dice metric. Even changing the activation
function in the baseline model from ReLU to Tanh shows improved results. For time series
regression/forecasting tasks, the experiments show that using multiple activation functions does not
significantly improve the accuracy of predictions. This paper also hints into an idea of full list of
activation functions, which would learn relation with the specificdata neuron is receiving. An idea
which requires further analysis.</p>
      <p>Overall, the paper concludes that while using multiple activation functions can have some
benefits in certain scenarios, the improvements are not substantial compared to using a single
activation function. The choice of activation function should be based on the specifictask, dataset
and its features.
https://doi.org/10.1145/3065386.
[11] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, (2015).</p>
      <p>https://doi.org/10.48550/arXiv.1512.03385.
[12] O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional Networks for Biomedical Image</p>
      <p>Segmentation, (2015). https://doi.org/10.48550/arXiv.1505.04597.
[13] H. Bai, L. Liu, Q. Han, Y. Zhao, Y. Zhao, A novel UNet segmentation method based on deep
learning for preferential flow in soil, Soil &amp; Tillage Research. 233 (2023) 105792-.
https://doi.org/10.1016/j.still.2023.105792.
[14] K.K. Wong, A. Zhang, K. Yang, S. Wu, D.N. Ghista, GCW-UNet segmentation of cardiac
magnetic resonance images for evaluation of left atrial enlargement, Computer Methods and
Programs in Biomedicine. 221 (2022) 106915–106915.
https://doi.org/10.1016/j.cmpb.2022.106915.
[15] G. Rani, P. Thakkar, A. Verma, V. Mehta, R. Chavan, V.S. Dhaka, R.K. Sharma, E. Vocaturo, E.</p>
      <p>Zumpano, KUB-UNet: Segmentation of Organs of Urinary System from a KUB X-ray Image,
Computer Methods and Programs in Biomedicine. 224 (2022) 107031–107031.
https://doi.org/10.1016/j.cmpb.2022.107031.
[16] A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, in: 2009.
https://www.semanticscholar.org/paper/Learning-Multiple-Layers-of-Features-from-TinyKrizhevsky/5d90f06bb70a0a3dced62413346235c02b1aa086 (accessed January 17, 2024).
[17] J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel, Man vs. computer: Benchmarking machine
learning algorithms for traffic sign recognition, Neural Networks. 32 (2012) 323–332.
https://doi.org/10.1016/j.neunet.2012.02.016.
[18] W.N. Street, W.H. Wolberg, O.L. Mangasarian, Nuclear feature extraction for breast tumor
diagnosis, in: R.S. Acharya, D.B. Goldgof (Eds.), San Jose, CA, 1993: pp. 861–870.
https://doi.org/10.1117/12.148698.
[19] A. Unwin, K. Kleinman, The Iris Data Set: In Search of the Source of Virginica, Significance.18
(2021) 26–29. https://doi.org/10.1111/1740-9713.01589.
[20] O. Ibrahim Obaid, M. Mohammed, M.K. Abd Ghani, S. Mostafa, F. Al-Dhief, Evaluating the
Performance of Machine Learning Techniques in the Classification of Wisconsin Breast
Cancer, International Journal of Engineering and Technology. 7 (2018) 160–166.
https://doi.org/10.14419/ijet.v7i4.36.23737.
[21] Amazon, Inc., Amazon.com, Inc. (AMZN) Stock Historical Prices &amp; Data - Yahoo Finance,</p>
      <p>Amazon. (2024). https://finance.yahoo.com/quote/AMZN/history/(accessed January 17, 2024).
[22] N. Sharma, V. Jain, A. Mishra, An Analysis Of Convolutional Neural Networks For Image
Classification, Procedia Computer Science. 132 (2018) 377–384.
https://doi.org/10.1016/j.procs.2018.05.198.
[23] T. Limouni, R. Yaagoubi, K. Bouziane, K. Guissi, E.H. Baali, Accurate one step and multistep
forecasting of very short-term PV power using LSTM-TCN model, Renewable Energy. 205
(2023) 1010–1024. https://doi.org/10.1016/j.renene.2023.01.118.
[24] L. Wang, M. Mao, J. Xie, Z. Liao, H. Zhang, H. Li, Accurate solar PV power prediction interval
method based on frequency-domain decomposition and LSTM model, Energy (Oxford). 262
(2023) 125592-. https://doi.org/10.1016/j.energy.2022.125592.
[25] X. Huang, Q. Li, Y. Tai, Z. Chen, J. Liu, J. Shi, W. Liu, Time series forecasting for hourly
photovoltaic power using conditional generative adversarial network and Bi-LSTM, Energy
(Oxford). 246 (2022) 123403-. https://doi.org/10.1016/j.energy.2022.123403.
[26] H. Gao, S. Qiu, J. Fang, N. Ma, J. Wang, K. Cheng, H. Wang, Y. Zhu, D. Hu, H. Liu, J. Wang,
Short-Term Prediction of PV Power Based on Combined Modal Decomposition and
NARXLSTM-LightGBM, Sustainability (Basel, Switzerland). 15 (2023) 8266-.
https://doi.org/10.3390/su15108266.
[27] J. Ospina, A. Newaz, M.O. Faruque, Forecasting of PV plant output using hybrid wavelet-based
LSTM-DNN structure model, IET Renewable Power Generation. 13 (2019) 1087–1095.
https://doi.org/10.1049/iet-rpg.2018.5779.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.R.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.B.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <article-title>Activation functions in deep learning: A comprehensive survey and benchmark</article-title>
          ,
          <source>Neurocomputing (Amsterdam)</source>
          .
          <volume>503</volume>
          (
          <year>2022</year>
          )
          <fpage>92</fpage>
          -
          <lpage>108</lpage>
          . https://doi.org/10.1016/j.neucom.
          <year>2022</year>
          .
          <volume>06</volume>
          .111.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Adu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Anokye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Ayidzoe</surname>
          </string-name>
          , RMAF:
          <article-title>Relu-Memristor-Like Activation Function for Deep Learning</article-title>
          , IEEE Access.
          <volume>8</volume>
          (
          <year>2020</year>
          )
          <fpage>72727</fpage>
          -
          <lpage>72741</lpage>
          . https://doi.org/10.1109/ACCESS.
          <year>2020</year>
          .
          <volume>2987829</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Smish:
          <string-name>
            <given-names>A Novel</given-names>
            <surname>Activation</surname>
          </string-name>
          <article-title>Function for Deep Learning Methods</article-title>
          ,
          <source>Electronics (Basel)</source>
          .
          <volume>11</volume>
          (
          <year>2022</year>
          )
          <fpage>540</fpage>
          -. https://doi.org/10.3390/electronics11040540.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wuraola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.K.</given-names>
            <surname>Nguang</surname>
          </string-name>
          ,
          <article-title>Efficient activation functions for embedded inference engines</article-title>
          ,
          <source>Neurocomputing</source>
          .
          <volume>442</volume>
          (
          <year>2021</year>
          )
          <fpage>73</fpage>
          -
          <lpage>88</lpage>
          . https://doi.org/10.1016/j.neucom.
          <year>2021</year>
          .
          <volume>02</volume>
          .030.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaytan</surname>
          </string-name>
          , İ.B.
          <string-name>
            <surname>Aydilek</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Yeroğlu</surname>
          </string-name>
          ,
          <article-title>Gish: a novel activation function for image classification</article-title>
          ,
          <source>Neural Comput &amp; Applic</source>
          .
          <volume>35</volume>
          (
          <year>2023</year>
          )
          <fpage>24259</fpage>
          -
          <lpage>24281</lpage>
          . https://doi.org/10.1007/s00521-023-09035-5.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cui</surname>
          </string-name>
          , W. Liu,
          <article-title>PV Power Prediction Based on LSTM With Adaptive Hyperparameter Adjustment</article-title>
          , IEEE Access.
          <volume>7</volume>
          (
          <year>2019</year>
          )
          <fpage>115473</fpage>
          -
          <lpage>115486</lpage>
          . https://doi.org/10.1109/ACCESS.
          <year>2019</year>
          .
          <volume>2936597</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.S.</given-names>
            <surname>Anami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.V.</given-names>
            <surname>Sagarnal</surname>
          </string-name>
          ,
          <article-title>Influence of Different Activation Functions on Deep Learning Models in Indoor Scene Images Classification, Pattern Recognition and Image Analysis</article-title>
          .
          <volume>32</volume>
          (
          <year>2022</year>
          )
          <fpage>78</fpage>
          -
          <lpage>88</lpage>
          . https://doi.org/10.1134/S1054661821040039.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yizhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yaqin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhili</surname>
          </string-name>
          ,
          <article-title>The Role of Activation Function in CNN</article-title>
          ,
          <source>in: 2020 2nd International Conference on Information Technology and Computer Application (ITCA)</source>
          ,
          <year>2020</year>
          : pp.
          <fpage>429</fpage>
          -
          <lpage>432</lpage>
          . https://doi.org/10.1109/ITCA52113.
          <year>2020</year>
          .
          <volume>00096</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.H.</given-names>
            <surname>Essai Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.B.</given-names>
            <surname>Abdel-Raman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.A.</given-names>
            <surname>Badry</surname>
          </string-name>
          ,
          <article-title>Developing Novel Activation Functions Based Deep Learning LSTM for Classification</article-title>
          , IEEE Access.
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>97259</fpage>
          -
          <lpage>97275</lpage>
          . https://doi.org/10.1109/ACCESS.
          <year>2022</year>
          .
          <volume>3205774</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever, G. Hinton,
          <source>ImageNet Classification with Deep Convolutional Neural Networks, Neural Information Processing Systems</source>
          .
          <volume>25</volume>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>