Assigning different activation functions in artificial neural networks with the goal of achieving higher prediction accuracy *

Assigning different activation functions in artificial neural networks with the goal of achieving higher prediction accuracy * GytisBaravykas gytis.baravykas@ktu.lt Faculty of Informatics Kaunas University of Technology

Studentu 50 51368 Kaunas Lithuania

JustasKardoka justas.kardoka@ktu.lt Faculty of Informatics Kaunas University of Technology

Studentu 50 51368 Kaunas Lithuania

DomasGrigaliunas domas.grigaliunas@ktu.lt Faculty of Informatics Kaunas University of Technology

Studentu 50 51368 Kaunas Lithuania

DariusNaujokaitis darius.naujokaitis@ktu.lt Faculty of Informatics Kaunas University of Technology

Studentu 50 51368 Kaunas Lithuania

Smart Grids and Renewable Energy Laboratory Lithuanian Energy Institute

44403 Kaunas Lithuania

IVUS2024: Information Society University Studies

2024, May 17 Kaunas Lithuania

Assigning different activation functions in artificial neural networks with the goal of achieving higher prediction accuracy * 1613-0073 5C647556633742BB3870FEC7DF4E10CB GROBID - A machine learning software for extracting information from scholarly documents Activation functions artificial neural networks machine learning D. Naujokaitis) 0000-0002-8548-5056 (D. Naujokaitis)

The research paper explores the concept of using multiple activation functions in artificial neural networks and investigates their impact on model performance. The experiments conducted on various models such as AlexNet, ResNet50, TuNet, and SimpleNN reveal insights into the effectiveness of different activation function combinations. The results indicate that using multiple activation functions can lead to modest improvements in model performance, particularly in image segmentation tasks where modifications to the UNet architecture show significant enhancements. However, for time series regression/forecasting tasks, the experiments demonstrate that using multiple activation functions does not significantly improve prediction accuracy. Therefore, the paper concludes that while there are some benefits to using multiple activation functions in certain scenarios, the choice of activation function should be based on the specific task and dataset.

Introduction

Artificial neural networks (ANNs) are becoming increasingly more relevant. Although the idea of ANNs spans multiple decades, various ANN architectures are still widely being developed to this day. One of the most important components of ANNs is activation functions. They are often used for introducing non-linearity, and in turn, allow ANNs to understand intricate features in the data. Although different activation functions have been developed and studied, there exists no body of work in which the choice of activation functions would be considered in the case of solar power generation forecasts. In this paper, we propose a new approach for improving the results of ANN predictions via changing the activation functions in the ANN. We have chosen to test our approach on a range of different machine learning tasks, with the goal of introducing a new, alternative hyper-parameter that would work for different ANN architectures.

Literature review

Activation functions in an ANN are used to introduce non-linear relations to the data, so that the network would better fit the results and improve the accuracy of a given task. It is a very common part of ANNs and often omitted from neural network structure diagrams. Many mathematical functions have been introduced to achieve non-linearity, such as ReLU, Tanh, Sigmoid and others, each tailored to specific tasks. In this paper we entertain the idea of using no one activation function per layer or network, but multiple, assigning a different one for each neuron.

The importance of activation functions is discussed in many recent works. Their importance is based on their wide-spread usage in ANN architectures. Dubey has published a comprehensive overview of the most common activation functions, along with their characteristics and a performance comparison between them [1]. They have found that different activations functions are more suited for certain machine learning tasks, and that in certain cases, alternative choices must be considered. Although there are some common choices, new activation functions are constantly being developed [2,3,4,5,6]. Yu has created a modified activation function based on ReLU, with the goal of increasing the accuracy of classification tasks [2]. Wang developed a activation function as a better alternative to other commonly used activation functions [3]. The developed activation function, Smish, performed better than other common activation functions in classification tasks on open datasets. Wuraola has developed a family of activation functions that are to be used in embedded systems [4]. The proposed activation functions were shown to be computationally faster, and their use resulted in higher accuracy results than other common activation functions in recurrent neural networks and logistic regression models. Kaytan has introduced a new non-monotonic activation function capable of achieving higher results than other activation functions like Swish, Mish and others for image classification tasks [5]. Chai developed a new model based on LSTM capable of achieving higher accuracy for short-term PV generation forecasts [6]. The model uses a newly proposed activation function that helps solve the gradient disappearance problem and ensures a high accuracy of the prediction results for the task of short-term PV generation. There are also works in which the activation functions of the default implementation of model architectures are switched with other, alternative activation functions. Anami had performed experiments in which they had tried to compare prediction results by switching the default activation function with other different, common activation functions [7]. Wang has performed experiments in which they tried to use alternative activation functions in VGG16, ResNet50 and LeNet architectures, achieving superior results [8]. Essai Ali has tried to modify a LSTM by changing its' Tanh functions to different activation functions [9]. The author has achieved his aim of increasing the classification accuracy from 86% to 88% using the Weather Reports dataset, and from 93% to 97 % using the Japanese Vowels dataset. Let's review the concept displayed in Figure 1. In this example we have an input layer, hidden layer of 2 neurons and one output layer. Each neuron has a different function applied to it. Calculations for such a network is as follows:

Methodology

Activation functions𝑛 ℎ 𝑖 = ∑ 𝑤 𝑖𝑗 ⋅ 𝑥 𝑗 + 𝑏 𝑖 (1) =1 𝑗 𝑧 1 = 𝑟𝑒𝑙𝑢(ℎ 1 ) (2) 𝑧 2 = ℎ 𝑡𝑎𝑛 (ℎ 2 ) (3) 𝑜 1 = 𝑧 1 𝑤 𝑟𝑒𝑙𝑢 + 𝑧 2 𝑤 ℎ 𝑡𝑎𝑛 (4)

where ℎ -hidden layers, 𝑤 -weights, 𝑥 -inputs, 𝑏 -bias, 𝑧 -activation function results and 𝑜outputs. In an artificial convolutional neural network activations play a similar role, but because there are no actual neurons in a convolutional layer, different application is required. For the convolution layer 2 approaches were introduced. For linear layers it is also possible to have a complete list of activation functions assigned. This idea is later experimented in this paper. Combinations of this list can be calculated as such. In this case 2 activation functions (ReLU, Tanh) power by 4 neurons equal to 16 variations:

𝑣 = 𝑒 𝑛(5)

where 𝑣 -variations, 𝑒 -elected activations and 𝑛 -number of neurons. It must also be noted that various activation functions can be used, and it is not limited to the most used activation functions such as ReLU, Tanh, Sigmoid, etc. The range of activation functions that were tested in this work are detailed in the experiments section.

Models

There has been a vast selection of CNN models proposed for image classification, a lot of those have complex implementations and long training hours. The models chosen for this paper are a low to mid-range complexity to test out the theory. Starting with SimpleNN, a simple neural network with one hidden layer of N neurons. TuNet -a CNN with 2 convolutions, 2 polling layers and 3 linear layers [10]. AlexNet is a convolutional neural network (CNN) architecture that consists of five convolutional layers, three fully connected layers, and two pooling layers [10]. The convolutional layers extract features from the input images, while the pooling layers reduce the dimensionality of the feature maps. The fully connected layers learn a mapping from the extracted features to the output classes. Some of the key innovations introduced by AlexNet include the use of rectified linear unit (ReLU) activation functions, dropout regularization, and data augmentation techniques.

ResNet50 derives its name from its depth, incorporating 50 layers [11]. Notably, ResNet50 addresses the challenge of training deep networks by introducing residual connections that enable the direct flow of information across layers. This innovation mitigates the vanishing gradient problem, allowing for the successful training of extremely deep networks.

The architecture comprises building blocks known as residual blocks, each containing skip connections that bypass one or more layers. These skip connections facilitate the smooth propagation of gradients during backpropagation, enhancing the model's ability to capture intricate features. Additionally, ResNet50 employs batch normalization to accelerate training convergence and improve generalization performance.

UNet was used for image segmentation tasks [12]. It is a popular model with several modifications over the years [13,14,15]. The model has improved on the results of previous image segmentation models by its' architecture consisting of a contracting path used for capturing context and a symmetric expanding path used that enables precise localization [12]. The resulting architecture consists of 23 convolutional layers and the architecture utilizes the ReLU activation function. The model also heavily utilizes image augmentation, which enables it to achieve high accuracy without relying on many training images.

Datasets

Images

Several image datasets are popular for testing performance of CNN models. The CIFAR-100 is a dataset containing 60 000 32x32 color images with 100 classes (600 images per class). It is a subset of the Tiny Images dataset and is commonly used for fine-grained image classification [16]. The dataset contains a wide variety of images of objects, animals, and textures. The images are labeled with both fine-grained and coarse labels. The fine-grained labels correspond to the specific object or scene in the image, while the coarse labels correspond to the superclass of the object or scene.

The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011 [17]. The following dataset includes 43 classes of traffic signs and more than 50,000 images.

Cityscapes dataset is a popular image segmentation dataset that consists of 25 000 such images captured from a moving vehicle [13,14,15]. The images were taken in different cities in Germany during different weather conditions. The dataset consists of 50 different classes. Each dataset item consists of a horizontally joined image, in which the left image is the original photograph, meanwhile the right image is the semantically segmented version of the image.

Tabular

Two tabular datasets were incorporated in this paper: breast cancer and iris flower classification. Breast cancer dataset features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass [18]. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at http://www.cs.wisc.edu/~street/images/.

Iris flowers dataset is one of the earliest datasets used in literature on classification methods and widely used in statistics and machine learning [19]. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are not linearly separable from each other. When performing experiments, Obaid's work was used as a benchmark for the comparison of results [20].

Timeseries

Timeseries data for amazon stocks with stock price, closing price and other attributes was used [21]. Additionally, a custom photovoltaic (PV) panel generation dataset was used. The data consists of about a year of meteorological and PV generation data. The PV generation data was retrieved from a PV station in Kaunas, Lithuania, meanwhile the publicly available meteorological data was retrieved from Oikolab and from the Lithuanian Hydrometeorological Service. It was also attempted to include METAR data on cloud conditions at different altitudes, but utilizing this data did not provide any improvement to the results, so it was left out from the dataset. Based on the observed linear relationships between different meteorological features and PV generation, certain meteorological features were chosen to be used in the experiments (see Figure 4). As can be seen from the relationships between different features, a strong linear relationship between PV generation and air temperature, surface solar radiation has been observed. It was noted that using other meteorological data improved the results, although these features did not seem to have a linear relationship with the PV generation data. In total, the dataset consists of the following 11 features (see Table 1). As it can be seen from the table, a wide range of different meteorological variables were used.

Environment

Google Collab environment with a single NVIDIA Tesla T4 GPU was used for experimentations of AlexNet and ResNet50 on CIFAR100. For GTSRB, UNet and LSTM experiments, the models were trained on two Tesla T4 GPU setup. Amazon stock close predictions were performed on a Kaggle provided CPU.

Experiments and results

Image classification

CIFAR-100 with AlexNet

Inspired by Sharma's work [22], we choose AlexNet as the primary target. Main reasons for choosing this architecture were that it had linear layers aside convolution blocks. We began experiments with the OriginalAlexNet implementation as a baseline with Tanh. Next, we experimented with changing only linear layers -changing one layer then changing both. The change was that instead of applying a single activation function, we applied 2 or 3 in cyclic order. The best results were with Tanh and Softmax combination of functions -1.14% improvement in testing accuracy compared to the ReLU baseline, however, Tanh baseline was still more superior.

Later, we expanded experimentation with modifying Convolution Neural Network layers (CNN). Here implementation consisted of changing activation functions per channel. This showed marginally better results than the OriginalAlexNet with ReLU -0.36% improvement.

For experimentation, hyper parameters were the following: learning rate -0.0001, batch size -256 and number of epochs -40.

CIFAR-100 with ResNet50

We have also investigated Residual networks block, using ResNet50 architecture (see Table 3). Hyperparameters used for the experiment: learning rate -0.0001, batch size -256 and number of epochs -12. As can be seen from results, only a combination of three functions -Tanh, Softmax and ReLUmanaged to outperform baseline model with ReLU by 0.66% margin. Other combinations were below.

GTSRB with TuNet

Classifying images are pre-processed in the same manner and on the same training parameters as in the previous experiments, meanwhile the fixed size image is 32 by 32 pixels. The training parameters for TuNet are as follows: optimizer -Adam, learning rate -0.001, loss function -cross entropy and batch size -32. As can be seen in Table 4, the results of the TuNet baseline are generally worse than of the modified architecture: In the table, several different models can be seen:  TuNet -baseline model.

 TuNetOnlyNN -a model, where convolution has one activation function and neuron linear layers have specific activation function for each neuron.

 TuNetPerNeuronAndChannel -a model, where convolution layers have a specific activation function for each channel and a specific activation for each neuron in linear layer.

We can see a very slight improvement when different activations are applied to only the linear layer.

Cityscapes with UNet

For the image segmentation task, the popular Cityscapes dataset was chosen alongside the UNet model. The following parameters were the same for all the experiments using UNet: Adam optimizer with a learning rate of 0.001, the mean-squared error as the loss function, a batch size of 4 and 20 as the number of epochs for training.

As it can be seen from the results of the experiments, a significant Dice metric increase of about 10% was achieved by various activation function combinations (see Table 5). As can be seen from the table, using almost any combinations of activation functions can result in better prediction results in the case of UNet. It is also observed that even changing the activation in the baseline model from ReLU to Tanh has improved the results by a significant amount as well.

Time series regression/forecasting

Simple NN on Amazon stock prediction

Experiments were performed on Amazon stock timeseries data predict the closing price for the next day. An architecture named SimpleNN was used. It is a neural network with 1 input cells, 14 hidden layer cells and 1 output. The following parameters were used in the experiment: optimizer -Adam, learning rate -0.001, loss function -mean-squared error, batch size -16, lag values -7 and number of training epochs -5.

The experiment compares the same model and its architecture, the only difference is activations per neuron and one activation for the whole network (see Table 6). As can be seen from the results, there is an increase in accuracy in certain cases, and it can also be observed that finding the best possible set of activation functions yielded the best results out of the experiments.

Custom PV dataset with LSTM

Experiments were performed using a time-series dataset for forecasting PV generation. An LSTM model was used, as it is often utilized for solving PV generation forecast tasks [23,24,25,26,27]. For performing the forecasts, the output of the previous step is used as the input of the following training step. The following parameters were used for the experiments: Adam optimizer with a learning rate of 0.001, mean-squared error for the error metric, a batch size of 8, 12 lag values for the PV data, and 20 training epochs.

The parameters for the experiments were chosen based on experiments performed using different sets of parameters. The batch size refers to the number of predictions retrieved from the model output and the lag values refers to the number of previous predictions to use as input of the next prediction. Based on tests using different lag values, a value of 12 was noticed to be one of the best values for this parameter, although this parameter did not seem to have much impact on the accuracy of predictions. Regarding transformations of data, the training data has been standardized so that the ranges of values would be the same for all features. As can be seen from Table 7, there is no significant improvement based on testing RMSLE. Although many experiments yielded similar results to the baseline, there was not a single experiment which yielded better results than the baseline. It can also be observed that an increase in the number of different activation functions used does not improve the forecast results either.

Figure 1 .1Figure 1. A simple neural network with different activation functions per neuron

Figure 2 . 1 .21Figure 2.1. Different activation functions per channel, 2.2. Different activation function for each matrix column. In regular CNN architectures there is often only one activation function in a convolution layer. As displayed in the diagram Figure 2.1. different activation function can be applied to each channel after the convolution layer. Second diagram Figure 2.2. refers to another idea to apply multiple activation functions for each matrix column. In this case 3x3 matrix there are 3 columns in each channel. Every slice has a specific activation applied to it.

Figure 3 .3Figure 3. One activation for convolution layers and different activation functions in linear layer. Some CNN architectures have a linear neuron layer which typically have only on activation function. The idea displayed on Figure 3 is to leave one activation in convolution layers and only have multiple activation functions in linear neuron layers, specifically an activation function for each neuron. As displayer in the diagram boxes (1-4) can each have a specific function assigned creating a spectrum of variations: (1-tanh, 2-relu, 3-sigmoid, 4-softmax), (1-relu, 2-tanh, 3-sigmoid, 4-relu) and so on.For linear layers it is also possible to have a complete list of activation functions assigned. This idea is later experimented in this paper. Combinations of this list can be calculated as such. In this case 2 activation functions (ReLU, Tanh) power by 4 neurons equal to 16 variations:

Figure 4 .4Figure 4. Scatter plots between PV generation data and surface solar radiation and air temperature.

Table 1 .1Features used in the dataset, their data providers and measurement unitsFeature nameData providerMeasurement unitsGenerated power-kWAir temperatureLHS°CSea level pressureLHShPaRelative humidityLHS%Wind speedLHSm/sWind gust speedLHSm/sIs wind from north (true / false)LHS-Is wind from south (true / false)LHS-Is wind from west (true / false)LHS-Surface solar radiationOikolabW/m²Total cloud coverOikolab%

Table 2 .2Results from AlexNet experiments.TrainingActivations Trainin gTrainingValidationTestingtime minaccuracyaccuracyaccuracyOriginalAlexNetReLU34.7581.20936.6436.95OriginalAlexNetbTanh26.1184.21643.06043.18AlexNetCustomLinear2aTanh, Softmax35.0381.47337.8437.21AlexNetCustomLinear2bTanh,36.7782.4636.6838.36SoftmaxAlexNetCustomLinear2rrandom list36.1182.31637.237.41AlexNetCustomCNNaTanh, Softmax35.7682.42737.3237.31AlexNetCustomCNNbTanh, Softmax35.7381.50236.6237.31AlexNetCustomCNNrrandom list35.2680.76738.1637.17

Table 3 .3Results from ResNet50 experimentations.TrainingActivationsTraininTraininValidatioTesting time, ming accuracyn accuracyg accuracy

Table 4 .4Results from TuNet experimentations.ModelActivationsEpoch Training time (1 epoch), msTraining accuracyValidation accuracyTuNet (baseline)Tanh87007.230.99730.9834TuNetReLU107066.440.97210.9599TuNetOnlyNN(Tanh)ReLU, Tanh1016265.210.99900.9863TuNetOnlyNN(Tanh)Tanh, Softplus918699.110.99610.9837TuNetOnlyNN(Tanh)ReLU, Tanh, Softplus1018615.310.99430.9851TuNetOnlyNN(Tanh)ReLU, Tanh, ELU1016559.020.99450.9849TuNetPerNeuronAndChannelReLU, Tanh818736.370.99450.9800TuNetPerNeuronAndChannelTanh, Sigmoid1017864.020.99390.9809TuNetPerNeuronAndChannelTanh, Softplus1021888.240.99290.9813TuNetPerNeuronAndChannelReLU, Tanh, ELU919664.910.99310.9836

Table 5 . Results from UNet experimentations5ModelActivationssEpochTrainin g time, msTrain. diceValid. diceUNet UNetReLU Tanh10 101378448.12 1380602.750.4700 0.46800.4062 0.4334UNetPerNeuronReLU, Tanh104429268.500.47470.4293UNetPerNeuronTanh, ReLu104430903.500.46560.4884UNetPerNeuronTanh, Softmax104487534.500.37160.3389UNetPerNeuronAndReLU, Tanh104487183.000.47140.5013ChannelUNetPerNeuronAnd ChannelReLU, Softmax104600614.000.37330.4442UNetPerNeuronAndTanh, Softmax104539303.000.36960.4242Channel UNetPerNeuronAnd ChannelTanh, Softplus104526773.000.46970.4453UNetPerNeuronAndTanh, Softplus83621696.250.46850.4958Channel UNetPerNeuronAnd ChannelTanh, ReLU, Softplus104516755.500.47090.4468UNetPerNeuronAndTanh, ReLU, Softplus94065430.750.47000.5081Channel UNetPerNeuronAnd ChannelReLU, Tanh, ELU104525098.500.46960.4339UNetPerNeuronAndReLU, Tanh, ELU73169012.250.46460.4654Channel

Table 6 .6Testing results of SimpleNN and PerNeuron models. Additionally, all possible combinations of different activation functions sets have been tested (see model PerNeuronList).ModelActivationsMAERMSERMSLESimpleNN SimpleNN PerNeuronReLU (baseline) Tanh Tanh, ReLU2.8582 2.8583 3.00033.7894 3.9185 4.07900.0312 0.0316 0.0332PerNeuron PerNeuronReLU, Tanh ReLU, ReLU, Sigmoid3.0899 2.73144.1825 3.69510.0343 0.0301

Table 7 . Results from UNet experimentations7ModelActivationsEpochs TrainingTestTestTestTime (ms)MAEMAERMSERMSLELSTM LSTMDefault (Tanh, Sigmoid) Tanh, Softmax20 200.0563 0.05650.0757 0.08670.1262 0.14120.070 0.0806197461.00 3275714.00LSTMELU, Sigmoid200.20560.21130.28820.17343259991.50LSTMSigmoid, ELU200.17920.18630.25160.17273271329.50LSTMSigmoid, Tanh200.05330.08570.14200.07833096815.50LSTMSigmoid,80.06930.07820.13050.07211248338.12LSTMTanh Sigmoid, Softmax200.07400.07980.13170.07413708114.00LSTMELU, Sigmoid, Tanh200.17480.18230.24690.16152843427.75LSTMELU, Tanh, Sigmoid200.18170.25530.17960.15422818499.50LSTMSoftmax,200.06060.07910.13150.07263155361.75Sigmoid, TanhLSTMSoftmax, Tanh, Sigmoid200.06040.08140.13310.07563171810.00

Tabular

Tabular data is still widely used in machine learning tasks. In this paper we choose two datasets to experiment with the changes on Iris flowers and Breast cancer classifications. Both experiments have the following training parameters: optimizer -SGD, learning rate -0.01, loss function -cross entropy loss and number of training epochs -200.

From results displayed in Table 8 comparing one activation versus multiple for this Iris flowers classification task, there is no improvement compared to best suited activation function. Additionally, a activation function set from a large number of combinations was selected and the accuracy using it is better compared to one activation function (see Table 10). It should also be noted that better results were achieved than from the SVM described in Obaid's work. As can be seen from the results, there is a significant accuracy increase for the PerNeuron models, whilst the most significant increase can be seen when finding the best activation function list from all possible combinations.

Conclusions and discussion

The research paper explores the concept of using multiple activation functions in artificial neural networks. It discusses the role of activation functions in introducing non-linear relations to improve the accuracy of tasks. The paper investigates different approaches to incorporating multiple activation functions, including assigning a different function to each neuron or channel.

The experiments included using models such as AlexNet, ResNet50, TuNet, and SimpleNN. In the AlexNet experiment, different activation function combinations were tested in both linear layers and convolutional neural network (CNN) layers. The results showed that using OriginalAlexNet with Tanh activation function yielded the best overall performance. The ResNet50 experiments resulted in one combination performing marginally better than any of single function baselines. The TuNet and SimpleNN experiments aimed to evaluate the performance of these specific architectures on their respective datasets. Overall, the experiments provided insights into the impact of activation function combinations on model performance, with modest improvements observed compared to using a single activation function. The datasets used in the experiments included CIFAR-100, GTSRB, Breast Cancer Wisconsin (Diagnostic), Iris flowers, and Amazon stocks. In image segmentation tasks, modifying the UNet architecture with different activation function combinations leads to significant improvements in the Dice metric. Even changing the activation function in the baseline model from ReLU to Tanh shows improved results. For time series regression/forecasting tasks, the experiments show that using multiple activation functions does not significantly improve the accuracy of predictions. This paper also hints into an idea of full list of activation functions, which would learn relation with the specific data neuron is receiving. An idea which requires further analysis.

Overall, the paper concludes that while using multiple activation functions can have some benefits in certain scenarios, the improvements are not substantial compared to using a single activation function. The choice of activation function should be based on the specific task, dataset and its features.

Activation functions in deep learning: A comprehensive survey and benchmark SRDubey SKSingh BBChaudhuri 10.1016/j.neucom.2022.06.111 Neurocomputing 503 2022 RMAF: Relu-Memristor-Like Activation Function for Deep Learning YYu KAdu NTashi PAnokye XWang MAAyidzoe 10.1109/ACCESS.2020.2987829 IEEE Access 8 2020 Smish: A Novel Activation Function for Deep Learning Methods XWang HRen AWang 10.3390/electronics11040540 Electronics 11 540 2022 Efficient activation functions for embedded inference engines AWuraola NPatel SKNguang 10.1016/j.neucom.2021.02.030 Neurocomputing 442 2021 Gish: a novel activation function for image classification MKaytan İBAydilek CYeroğlu 10.1007/s00521-023-09035-5 Neural Comput & Applic 35 2023 PV Power Prediction Based on LSTM With Adaptive Hyperparameter Adjustment MChai FXia SHao DPeng CCui WLiu 10.1109/ACCESS.2019.2936597 IEEE Access 7 2019 Influence of Different Activation Functions on Deep Learning Models in Indoor Scene Images Classification BSAnami CVSagarnal 10.1134/S1054661821040039 Pattern Recognition and Image Analysis 32 2022 The Role of Activation Function in CNN WHao WYizhou LYaqin SZhili 10.1109/ITCA52113.2020.00096 2020 2nd International Conference on Information Technology and Computer Application (ITCA) 2020 Developing Novel Activation Functions Based Deep Learning LSTM for Classification MHEssai Ali ABAbdel-Raman EABadry 10.1109/ACCESS.2022.3205774 IEEE Access 10 2022 ImageNet Classification with Deep Convolutional Neural Networks AKrizhevsky ISutskever GHinton 10.1145/3065386 Neural Information Processing Systems 25 2012 Deep Residual Learning for Image Recognition KHe XZhang SRen JSun 10.48550/arXiv.1512.03385 2015 U-Net: Convolutional Networks for Biomedical Image Segmentation ORonneberger PFischer TBrox 10.48550/arXiv.1505.04597 2015 A novel UNet segmentation method based on deep learning for preferential flow in soil HBai LLiu QHan YZhao YZhao 10.1016/j.still.2023.105792 Soil & Tillage Research 233 105792 2023 GCW-UNet segmentation of cardiac magnetic resonance images for evaluation of left atrial enlargement KKWong AZhang KYang SWu DNGhista 10.1016/j.cmpb.2022.106915 Computer Methods and Programs in Biomedicine 221 2022 KUB-UNet: Segmentation of Organs of Urinary System from a KUB X-ray Image GRani PThakkar AVerma VMehta RChavan VSDhaka RKSharma EVocaturo EZumpano 10.1016/j.cmpb.2022.107031 Computer Methods and Programs in Biomedicine 224 2022 Learning Multiple Layers of Features from Tiny Images AKrizhevsky 2009. January 17, 2024 computer: Benchmarking machine learning algorithms for traffic sign recognition JStallkamp MSchlipsing JSalmen CIgel ManVs 10.1016/j.neunet.2012.02.016 Neural Networks 32 2012 Nuclear feature extraction for breast tumor diagnosis WNStreet WHWolberg OLMangasarian 10.1117/12.148698 R.S. Acharya, D.B. Goldgof 1993 San Jose, CA The Iris Data Set: In Search of the Source of Virginica AUnwin KKleinman 10.1111/1740-9713.01589 Significance 18 2021 Evaluating the Performance of Machine Learning Techniques in the Classification of Wisconsin Breast Cancer OIbrahimObaid MMohammed MKGhani SMostafa FAl-Dhief 10.14419/ijet.v7i4.36.23737 International Journal of Engineering and Technology 7 2018 IncAmazon Amazon Com AMZN) Stock Historical Prices & Data -Yahoo Finance Amazon 2024. January 17, 2024 Inc. An Analysis Of Convolutional Neural Networks For Image Classification NSharma VJain AMishra 10.1016/j.procs.2018.05.198 Procedia Computer Science 132 2018 Accurate one step and multistep forecasting of very short-term PV power using LSTM-TCN model TLimouni RYaagoubi KBouziane KGuissi EHBaali 10.1016/j.renene.2023.01.118 Renewable Energy 205 2023 Accurate solar PV power prediction interval method based on frequency-domain decomposition and LSTM model LWang MMao JXie ZLiao HZhang HLi 10.1016/j.energy.2022.125592 Energy 262 125592 2023 Time series forecasting for hourly photovoltaic power using conditional generative adversarial network and Bi-LSTM XHuang QLi YTai ZChen JLiu JShi WLiu 10.1016/j.energy.2022.123403 Energy 246 123403 2022 HGao SQiu JFang NMa JWang KCheng HWang YZhu DHu HLiu JWang 10.3390/su15108266 Short-Term Prediction of PV Power Based on Combined Modal Decomposition and NARX-LSTM-LightGBM, Sustainability

Basel, Switzerland

2023 15 8266 Forecasting of PV plant output using hybrid wavelet-based LSTM-DNN structure model JOspina ANewaz MOFaruque 10.1049/iet-rpg.2018.5779 IET Renewable Power Generation 13 2019