<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Weighted Shifted S-shaped Activation Functions</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sergiy</forename><surname>Popov</surname></persName>
							<email>serhii.popov@nure.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Kharkiv National University of Radio Electronics</orgName>
								<address>
									<addrLine>Nauky Ave. 14</addrLine>
									<postCode>61166</postCode>
									<settlement>Kharkiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dmytro</forename><surname>Pikhulya</surname></persName>
							<email>dmytro.pikhulia@nure.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Kharkiv National University of Radio Electronics</orgName>
								<address>
									<addrLine>Nauky Ave. 14</addrLine>
									<postCode>61166</postCode>
									<settlement>Kharkiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<address>
									<postCode>2024</postCode>
									<settlement>Cambridge</settlement>
									<region>MA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Weighted Shifted S-shaped Activation Functions</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C0FB76269C34AFB79F8C71DE41132015</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:11+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Convolutional neural network</term>
					<term>Image classification</term>
					<term>Activation function</term>
					<term>Bounded activation function</term>
					<term>Activation function modifications1</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we propose a family of activation functions (AFs) that can be considered as smooth approximations of bounded ReLU and similar AFs. These AFs are constructed by using a shifted originaligned S-shaped function as a basis, and weighing it with another S-shaped function, similar to how SiLU/GELU AFs weigh the 𝑓(𝑥) = 𝑥 function. The use of both regular and adaptive variants of such AFs is explored. The performance of the proposed family of AFs is evaluated in terms of the image classification accuracy with CNN models by comparing their multiple variants with the popular existing AFs on CIFAR-10 and Fashion-MNIST datasets using Adam and stochastic gradient descent (SGD) optimizers with different learning rates. Overall, 28 variants of the proposed AFs are compared with 21 variants of popular existing AFs (including the ReLU-like functions such as ReLU, Leaky ReLU, SiLU, GELU, PReLU, Swish, etc. and some S-shaped AFs), and 6 shifted S-shaped AFs. The experiments have shown that in most cases the adaptive versions of the proposed AFs provide a pronounced image classification accuracy advantage over all existing AFs that were considered when the Adam optimizer is used, and no consistent advantage with the SGD optimizer. Further research regarding the use of these AFs with the SGD optimizer, and the use of their non-adaptive variants is required.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Artificial intelligence (AI) as a field that strives to replicate different kinds of cognitive functions pertaining to humans inevitably has to deal with many kinds of computer vision (CV) tasks. The fact that CV-related tasks represent a broad part of AI tasks is a natural consequence of the fact that humans strongly rely on vision in many of their daily routines, which are in turn eventually being targeted for solution by AI. Convolutional neural networks (CNNs) are a class of artificial neural networks that proved to be very effective at solving many CV-related tasks, such as image classification, object detection, semantic segmentation, etc.</p><p>The performance of neural network models can vary depending on many factors such as the task being solved, the network's architecture being used, scale of the network, hyperparameters involved in tuning the model, etc. The choice of activation functions (AFs) is one of such hyperparameters that can significantly influence the network's capability to perform a certain task. In case of CNNs, like in many other cases, ReLU AF is a popular choice, along with other AFs that can be said as being its variations, such as Leaky ReLU <ref type="bibr" target="#b0">[1]</ref>, SiLU <ref type="bibr" target="#b1">[2]</ref>, GELU <ref type="bibr" target="#b2">[3]</ref>, ELU <ref type="bibr" target="#b3">[4]</ref>, etc. In this paper, such functions are called ReLU-like ones for convenience.</p><p>The ReLU-like functions effectively solve the vanishing gradient problem <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>, which is a typical problem with S-shaped AFs <ref type="bibr" target="#b5">[6]</ref>. In many cases, using ReLU-like functions lead to better model's effectiveness for solving the image classification tasks with CNNs than the S-shaped functions like Sigmoid and Tanh <ref type="bibr" target="#b6">[7]</ref>.</p><p>It has been shown that bounding of the ReLU function can be beneficial for training stability and classification accuracy with functions like BReLU, BLReLU <ref type="bibr" target="#b7">[8]</ref>. At the same time there are some improved variants of ReLU (LReLU, GELU, SiLU, PReLU) that can show better results, but these functions are not bounded, hence there's a potential in exploring the bounded versions of such functions to see if this could produce a cumulative improvement effect that would lead to better results than any of these functions.</p><p>The way that BReLU or similar (ReLU-6 <ref type="bibr" target="#b8">[9]</ref>) functions are bounded makes the function to have a fixed value with zero derivative after a certain value of its argument, which could degrade the network's training process. Alternative functions, which are formally not bounded, but still limit the function's growth after a certain value of its argument as well are functions like BLReLU <ref type="bibr" target="#b7">[8]</ref> and PLU <ref type="bibr" target="#b9">[10]</ref>. The BReLU, BLReLU, and PLU AFs are piecewise linear functions though. In this work it is assumed that smoothing the transitions in such functions by replacing piecewise linear functions with smoother approximation functions could improve the overall network's approximation capabilities. The intuition behind this assumption is that real-world data would presumably typically be diverse enough to have no or few hard edges in distributions of most of its aspects.</p><p>Some of the functions that can be useful for creating smooth approximations of bounded ReLUlike functions are the shifted S-shaped functions. The work <ref type="bibr" target="#b10">[11]</ref> shows that the modified version of the Tanh function, which is shifted horizontally and vertically while still maintaining an intersection with the origin, called Shifted Tanh, achieves a better performance than the Tanh function, and can show a performance that is similar or slightly higher than that of the ReLU AF.</p><p>In this paper, we take a further look at such shifted S-shaped functions by first evaluating the performance of shifted variants of the Atan and Asinh functions, and then modifying them to represent better smoothed approximations of bounded ReLU-like functions. In this paper the respective shifted AFs conventionally have the "So" prefix added to them (meaning "shifted, originaligned"): SoTanh (same as Shifted Tanh in <ref type="bibr" target="#b10">[11]</ref>), SoAtan, SoAsinh. Regarding the Asinh function in particular, it is worth noting that unlike most S-shaped functions, it is an unbounded one, which could potentially be a useful property for mitigating the vanishing gradient problem as it tends to have higher first derivative values for a wider range of 𝑥 than the bounded functions like Tanh and Atan.</p><p>More specifically, shifting an S-shaped function to the right is seen to potentially be beneficial due to the following reasons:</p><p>• This makes the negative function's part closer to the 𝑥 axis, similar to the shape of ReLU-like functions. It is informally assumed in this work that the proximity of ReLU-like functions to 0 plays a certain role in their effectiveness with CNNs. One of the explanations might be the assumption that such AF's property encourages learning sparse representations of network's inputs <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b10">11]</ref>. • This makes the function in its positive part to have a longer range where the functions value is closer to the 𝑓(𝑥) = 𝑥 function, compared to a regular unshifted version of the same Sfunction, which also makes their shape closer to that of Bounded ReLU or BLReU AFs, while also having smooth transitions. The proximity of the positive function's part to the 𝑓(𝑥) = 𝑥 function is hypothesized to contribute to preventing both vanishing gradients and exploding gradients in deep networks.</p><p>After testing the performance of SoTanh, SoAtan, and SoAsinh functions, we introduce their modified versions, which make them closer to bounded ReLU-like functions. One of the notable differences of SoTanh, SoAtan, SoAsinh functions from functions like ReLU, SiLU, GELU is that the negative part of SoTanh, SoAtan, and SoAsinh AFs is notably farther away from the 𝑥 axis than ReLU/SiLU/GELU. Hence, it is hypothesized that these shifted S-shaped functions might fail to introduce the activations sparsity that can be seen with ReLU/SiLU/GELU. Thus, there could be a potential for improving their performance by "pushing" their negative part closer to the 𝑥 axis. Since we strive to create smooth approximations of bounded ReLU-like functions, we choose to explore the same method of "pushing" negative part to the 𝑥 axis as the one used by the SiLU and GELU functions in this work. As a result, we create shifted S-shaped functions, which are weighted by another S-shaped function that has a range of (0; 1) and a value of 0.5 at 𝑥 = 0. In this paper we call such a family of functions as weighted shifted origin-aligned S-shaped functions (WSoS functions). By using two variants of weight functions and three variants of base S-shaped functions in this work we introduce and investigate the following specific WSoS AFs: SiSoTanh, SiSoAtan, SiSoAsinh, GeSoTanh, GeSoAtan, GeSoAsinh (see section 2.2).</p><p>Considering the fact that bounded functions are prone to causing the vanishing gradient problem, in this work we try to mitigate the likelihood of this problem by prolonging the range of the function in its positive range where the function has values close to the 𝑓(𝑥) = 𝑥 function. We do this by introducing the scaling parameters. Besides, there's one more adjustable parameter that identifies the amount by which the base S-function is shifted along the 𝑥 axis.</p><p>Finding a good combination among permutations of all AF's parameters can potentially be hard, so we first perform experiments with certain fixed parameter values, and then make these parameters to be trainable by creating adaptive versions of these AFs. In case of adaptive versions of the AFs, we share the parameters across the entire model rather than introducing different trainable parameter sets per each network's neuron.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Shifted origin-aligned S-shaped AFs</head><p>The notion of shifted origin-aligned S-shaped functions is not new (the Shifted Tanh function was explored in <ref type="bibr" target="#b10">[11]</ref>), but we include them into the comparison to see how the AFs proposed in this work stack up against them along with other existing AFs. Besides, in addition to the Shifted Tanh function (which is called SoTanh in this work for brevity and consistency with the proposed AFs), this work also explores the shifted versions of Atan and Asinh functions, which follow the same pattern, and evaluates the performance of their adaptive variants.</p><p>The general form of such shifted origin-aligned S-shaped functions 𝑆𝑜(𝑥) used in this work can be described with the following formula:</p><formula xml:id="formula_0">𝑆𝑜(𝑥) = 𝑆(𝑥 − 𝛼) + 𝑆(𝛼),<label>(1)</label></formula><p>where S is an arbitrary S-shaped function, which is also called a base function in this paper, and α is a value by which the function is shifted horizontally. This results in the following three AFs, which represent the shifted versions of Tanh, Atan, and Asinh functions (see Table <ref type="table">1</ref>). Examples of such AFs can be seen in Figure <ref type="figure" target="#fig_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1 Shifted origin-aligned AFs evaluated in this work</head><p>Base function Shifted, origin-aligned AF formula Tanh 𝑆𝑜𝑇𝑎𝑛ℎ(𝑥) = 𝑇𝑎𝑛ℎ(𝑥 − 𝛼) + 𝑇𝑎𝑛ℎ(𝛼)</p><formula xml:id="formula_1">Atan 𝑆𝑜𝐴𝑡𝑎𝑛(𝑥) = 𝐴𝑡𝑎𝑛(𝑥 − 𝛼) + 𝐴𝑡𝑎𝑛(𝛼) Asinh 𝑆𝑜𝐴𝑠𝑖𝑛ℎ(𝑥) = 𝐴𝑠𝑖𝑛ℎ(𝑥 − 𝛼) + 𝐴𝑠𝑖𝑛ℎ(𝛼)</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Weighted shifted origin-aligned S-shaped AFs</head><p>The family of AFs proposed in this work (by convention called as WSoS AFs family in this paper) contains the modifications of shifted origin-aligned S-shaped functions (see section 2.1), whose negative part is softly pushed closer to the 𝑥 axis by weighing them with another S-shaped function, as can be described in a general form by this formula:</p><formula xml:id="formula_2">𝑊𝑆𝑜(𝑥) = 𝛾𝑊(𝑥)𝛽𝑆𝑜( 1 𝛽 𝑥),<label>(2)</label></formula><p>where 𝛾 is a function's vertical scaling parameter. 𝑊(𝑥) is any S-shaped function in range (0; 1) symmetric with respect to the point (0, 0.5).</p><p>𝛽 is a horizontal and vertical scaling parameter for the 𝑆𝑜 function. 𝑆𝑜(𝑥) is a shifted origin-aligned S-shaped function <ref type="bibr" target="#b0">(1)</ref>.  <ref type="table">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2 Variants of the weight function W(x) used in this work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Weight function Formula</head><p>Si</p><formula xml:id="formula_3">𝑆𝑖(𝑥) = 𝜎(𝑥) = 1 1 + 𝑒 !" Ge 𝐺𝑒(𝑥) = 1 2 (𝐸𝑟𝑓(𝑥) + 1)</formula><p>With the three variants of shifted S-shaped functions listed in Table <ref type="table">1</ref>, this results in 6 specific AFs that belong to the class of WSoS AFs, which are explored in this work (see Table <ref type="table">3</ref>).</p><p>In an attempt to identify some concrete efficient AF variants, each of these functions is tested with several different sets of 𝛼, 𝛽, and 𝛾 parameters. Some examples of these functions with different values of α, β, 𝛾 parameters can be seen in Figure <ref type="figure">2</ref> and Figure <ref type="figure">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3 The proposed WSoS AFs</head><p>Activation function Formula </p><formula xml:id="formula_4">SiSoTanh 𝑆𝑖𝑆𝑜𝑇𝑎𝑛ℎ(𝑥) = 𝛾𝛽𝜎(𝑥) B𝑇𝑎𝑛ℎ B 1 𝛽 𝑥 − 𝛼C + 𝑇𝑎𝑛ℎ(𝛼)C SiSoAtan 𝑆𝑖𝑆𝑜𝐴𝑡𝑎𝑛(𝑥) = 𝛾𝛽𝜎(𝑥) B𝐴𝑡𝑎𝑛 B 1 𝛽 𝑥 − 𝛼C + 𝐴𝑡𝑎𝑛(𝛼)C SiSoAsinh 𝑆𝑖𝑆𝑜𝐴𝑠𝑖𝑛ℎ(𝑥) = 𝛾𝛽𝜎(𝑥) B𝐴𝑠𝑖𝑛ℎ B 1 𝛽 𝑥 − 𝛼C + 𝐴𝑠𝑖𝑛ℎ(𝛼)C GeSoTanh 𝐺𝑒𝑆𝑜𝑇𝑎𝑛ℎ(𝑥) = 1 2 𝛾𝛽(𝐸𝑟𝑓(𝑥) + 1) B𝑇𝑎𝑛ℎ B 1 𝛽 𝑥 − 𝛼C + 𝑇𝑎𝑛ℎ(𝛼)C GeSoAtan 𝐺𝑒𝑆𝑜𝐴𝑡𝑎𝑛(𝑥) = 1 2 𝛾𝛽(𝐸𝑟𝑓(𝑥) + 1) B𝐴𝑡𝑎𝑛 B 1 𝛽 𝑥 − 𝛼C + 𝐴𝑡𝑎𝑛(𝛼)C GeSoAsinh 𝐺𝑒𝑆𝑜𝐴𝑠𝑖𝑛ℎ(𝑥) = 1 2 𝛾𝛽(𝐸𝑟𝑓(𝑥) + 1) B𝐴𝑠𝑖𝑛ℎ B 1 𝛽 𝑥 − 𝛼C + 𝐴𝑠𝑖𝑛ℎ(𝛼)C</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Adaptive AF variants</head><p>In addition to the functions mentioned in Table <ref type="table">1</ref> and Table <ref type="table">3</ref>, this work considers respective adaptive variants of these AFs, which use the same AF formulas, but treat α, β, 𝛾 as trainable parameters, which are shared across the whole model. The resulting adaptive variants of shifted origin-aligned S-shaped AFs: ASoTanh, ASoAtan, ASoAsinh. The adaptive variants of the WSoS AFs (later called AWSoS functions for conciseness) are as follows: ASiSoTanh, ASiSoAtan, ASiSoAsinh, AGeSoTanh, AGeSoAtan, AGeSoAsinh.</p><p>The ASoTanh, ASoAtan, ASoAsinh functions are tested with one variant of the initial α parameter's value for each AF, and the ASiSoTanh, ASiSoAtan, ASiSoAsinh, AGeSoTanh, AGeSoAtan, AGeSoAsinh AFs are tested with several sets of initial parameter values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Experimental setup</head><p>All activation functions are tested and compared on the image classification task with various CNN models, datasets, and hyperparameters. Below are the details on the respective experiments that are performed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.1.">Activation functions being compared</head><p>In this paper, we compare the proposed activation functions belonging to the WSoS/AWSoS family with a set of existing activation functions by evaluating the average best test accuracy for each AF over several runs. The comparison includes the following functions: All the proposed AWSoS AFs share the trainable α, β, 𝛾 parameters across the entire model in this work. This means that using an adaptive variant of the AFs add just a single set of three trainable variables to the entire model, which means that the memory footprint from using these AFs remains practically unaffected.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.2.">Testing metrics and configurations</head><p>Each of the functions is tested several times in each of the test configurations listed in Table <ref type="table" target="#tab_1">5</ref>. Within the same testing configuration, for every AF, an average image classification accuracy (as well as the standard deviation) across several test runs is evaluated for each training epoch. Only the accuracies obtained from the test dataset (not the training dataset) are used. The maximum average accuracy value that was achieved by a certain AF on any of the training epochs in certain test configuration is considered as the accuracy of this AF in this test configuration. After obtaining accuracies for each AF in each configuration, a common comparison chart for each of the configurations is made, where accuracies of all AFs can be compared with each other within the respective testing configuration. Besides evaluating AF classification accuracies for each configuration, this work explores whether some AFs tend to have better/worse accuracy across configurations. In order to make this possible while dealing with different datasets, models, and hyperparameters, which can result in different accuracy ranges, a notion of AF accuracy rank is introduced. For any given AF, its accuracy rank 𝑅 #$,&amp; within a specific configuration 𝐶 is defined as a 1-based index in a list of AFs sorted by their classification accuracy in an ascending order within this configuration 𝐶. Provided that all configurations are performed over the same set of AFs, for any set 𝑀 of multiple test configurations 𝐶 ' … 𝐶 ( , a combined rank for each AF 𝑅 #$,) can be calculated by averaging the respective perconfiguration ranks for this AF:</p><formula xml:id="formula_5">𝑅 #$,) = 1 𝑛 L 𝑅 #$,&amp; ! ( *+' ,<label>(3)</label></formula><p>In this work, combined AF ranks are calculated for three configuration combinations: Configurations from Table <ref type="table" target="#tab_1">5</ref> that use the Adam optimizer. Configurations from Table <ref type="table" target="#tab_1">5</ref> that use the stochastic gradient descent (SGD) optimizer. All configurations from Table <ref type="table" target="#tab_1">5</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.3.">CNN models used</head><p>As can be seen in Table <ref type="table" target="#tab_1">5</ref>, each dataset is used with its respective model. The CIFAR-10 dataset is used with the CIFAR10-cls model (see Figure <ref type="figure" target="#fig_2">4</ref>), and Fashion-MNIST dataset is used with the FMNIST-cls model (see Figure <ref type="figure" target="#fig_3">5</ref>).  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head><p>The image classification accuracies that were identified in experiments for each AF in Table <ref type="table" target="#tab_0">4</ref> in each of the test configurations listed in Table <ref type="table" target="#tab_1">5</ref> can be seen in Table <ref type="table" target="#tab_2">6</ref> (for the proposed WSoS and AWSoS AFs) and Table <ref type="table" target="#tab_3">7</ref> (for existing AFs).</p><p>The AF measurements sorted by classification accuracy for the configurations CAL and CSL in Table <ref type="table" target="#tab_1">5</ref> are visualized on charts depicted in Figure <ref type="figure" target="#fig_4">6</ref> and Figure <ref type="figure">7</ref> respectively.      <ref type="table" target="#tab_1">5</ref> (lower is better), shows some AWSoS and shifted S-shaped AFs that are on average better than most Afs</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Analysis of the results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.1.">The advantage of adaptive AWSoS AFs with the Adam optimizer</head><p>Overall, reviewing the results from the experiments made in this work shows that the adaptive AWSoS AFs perform notably better than all other AFs when the model is trained with the Adam optimizer. Here are the respective notes that can be made regarding such observations:</p><p>• The adaptive AWSoS AF variants have a pronounced advantage in image classification accuracy over existing popular ReLU-like AFs in all testing configurations that use the Adam optimizer. With a few of exceptions all AWSoS AFs have resulted in higher image classification accuracies than all other AFs considered in this work in all testing configurations that use the Adam optimizer. This can in particular be seen by the respective combined AF ranks in Figure <ref type="figure">8</ref>. • The classification accuracy advantage of the AWSoS AFs can be seen to be even more pronounced with higher learning rates when using the In the CAH testing configuration, the highest-accuracy AF AGeSoTanh(1, 1, 1) shows an accuracy of 78.19%, which is ~3% higher than the highest-accuracy standard AF LeakyReLU(0.1) of 75.16% in this configuration. In comparison, a similar configuration with a lower learning rate CAL shows a lower advantage of ~0.8% which the highest-accuracy AWSoS AF ASiSoAsinh(1, 1, 1) (77.63%) has over the highest-accuracy existing AF LeakyReLU(0.1) (76.81%). A similar tendency can be seen on models trained for Fashion-MNIST image classification with low and high learning rates. • The choice of initial parameter values for the AWSoS AFs is seen to have no or little decisive effect with the Adam optimizer, and they consistently show higher accuracy than the existing AFs in most cases. Nevertheless, the choice of their parameter values is still important to fine tune the level of accuracy that can be achieved. • In 6 out of 8 testing configurations the ASiSoAsinh(1, 1, 1) AF has provided a classification accuracy higher than that of all considered existing AFs. Besides, in 4 out of 8 configurations this particular AF had shown an accuracy higher than all other compared AFs. • In the testing configurations that use the SGD optimizer, the AWSoS AFs don't have a consistent advantage over the existing AFs. Adaptive versions of these AFs are in many cases not stable in the configurations using SGD, where they often fail to converge during training. A few exceptions are the ASiSoAsinh(1, 1, 1) and AGeSoAsinh(1, 1, 1) AFs, which in three of four SGD-related configurations have provided a higher accuracy than most of the standard AFs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2.">Comparing non-adaptive WSoS functions to existing AFs</head><p>In many cases the proposed WSoS AFs provide image classification accuracy similar to the existing ReLU-like functions. Their performance is very sensitive to the choice of the α, β, 𝛾 parameter values, so they require respective attention for choosing the suitable parameter values. The results of this work don't provide sufficient data to make recommendations about the potentially more suitable parameter values and this topic requires further research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.3.">Observations related to shifted S-shaped AFs</head><p>The regular (non-weighted) shifted S-shaped AFs can be seen to provide an image classification accuracy that is comparable to ReLU-like AFs and typically higher than that of the ReLU AF. This is in line with the observations made in <ref type="bibr" target="#b10">[11]</ref>, which was exploring the Shifted Tanh function (named SoTanh in this paper). The experiments made in this work show that the other modifications of such a function, which are based on Atan and Asinh functions, can also significantly improve the classification accuracy compared to their regular unshifted variants.</p><p>The experiments also show that the adaptive versions of these AFs (ASoTanh, ASoAtan, ASoAsinh), which use the value of horizontal shift as a trainable parameter, in most cases provide an additional notable improvement in classification accuracy over the non-adaptive forms of these AFs (e.g., see Figure <ref type="figure" target="#fig_4">6</ref>-Figure <ref type="figure" target="#fig_0">10</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Computational performance considerations</head><p>This work primarily focuses on investigating the performance of the proposed AFs in terms of the classification accuracy in comparison with the existing ones. The analysis of the computational performance of the proposed functions, which measures the time required to train the model, and the time required to perform a forward pass when using the model in a production environment, was not the target of this work. Nevertheless, preliminary analysis confirms the intuitive assumption that using a function which requires more computations resources, like the AWSoS functions, requires more time. Preliminary measurements show that, when training on CPU, the AWSoS functions can take from ~10% more time than the Swish AF (for ASiSoTanh AF) to ~80% more time than the Swish AF (for ASiSoAsinh AF), but a more thorough study is required to identify the relative cost of using the WSoS/AWSoS AFs relative to the existing ones.</p><p>Besides, additional research is needed to evaluate the training speed of the proposed AFs in terms of the number of epochs required to reach certain accuracy, which, in combination with the assessment of the relative computational cost per one epoch, could allow a more realistic evaluation of the actual training speed that the proposed AFs can provide.</p><p>Nevertheless, the advantage that the AWSoS AFs can provide in terms of the classification accuracy can be important in some applications by itself regardless of the extra computational cost that might be required to train the model that achieves a higher performance, or use it in a production environment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Future work</head><p>As was mentioned above, a notable tendency about the AWSoS function is their pronounced advantage over the considered existing AFs with the Adam optimizer, but a not as good performance with the SGD optimizer. This difference requires further research to try to identify ways to improve their performance with the SGD optimizer. One hypothesis that might explain this issue is that the weight initialization method used in this research (Glorot uniform) might lead the model with these AFs to poor convergence while preventing it from finding a global minimum, which is mitigated by the Adam optimizer, but not SGD.</p><p>Other directions of further research include exploring the possibility of more computationally efficient variants of WSoS/AWSoS functions, exploring whether some parameter configurations for WSoS functions can be recommended as potentially more efficient ones, and exploring how the WSoS/AWSoS functions perform in significantly deeper networks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>This work proposes a class of weighted shifted origin-aligned S-shaped activation functions (WSoS AFs) and explores their performance in image classification tasks using CNNs in comparison with a range of existing AFs. An emphasis is made on comparing the proposed AFs with ReLU-like AFs, which are the most popular choice of AFs with CNNs.</p><p>These functions are considered as an evolution of shifted origin-aligned S-shaped functions (e.g. the ones similar to Shifted Tanh in <ref type="bibr" target="#b10">[11]</ref>), and are at the same time viewed as softly-bounded versions of ReLU-like functions GELU and SiLU in this work. The results of experiments show that they can indeed be used to improve the classification accuracy of shifted S-shaped functions and can compete with most of ReLU-like functions, but the classification accuracy that they provide significantly depends on the choice of their three parameters, which can be challenging.</p><p>At the same time, a notable result of this work is that the adaptive versions of the WSoS AFs (AWSoS AFs) in most of the tested configurations show a clear advantage over all tested existing AFs including the existing adaptive ones, but this advantage holds only when the training is done with the Adam optimizer, and not the SGD optimizer, where the training is often not stable with these AFs.</p><p>Further research is needed to explore ways of achieving similar advantages of AWSoS AFs with the SGD optimizer, which, according to preliminary experiments, could be made with changing the weight initialization method. Besides, more computationally effective forms of WSoS/AWSoS functions can also be explored in the future research. Another line of future research would consider AWSoS AFs in combination with other learning algorithms, including their robust modifications <ref type="bibr" target="#b11">[12]</ref>, and other neural network architectures <ref type="bibr" target="#b12">[13]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Shifted origin-aligned S-shaped functions with 𝛼 = 1.0</figDesc><graphic coords="4,133.13,191.16,350.57,213.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :Figure 3 :</head><label>23</label><figDesc>Figure 2: Weighted shifted origin-aligned S-shaped functions with α, β, 𝛾 all equal to 1.0</figDesc><graphic coords="5,133.13,292.45,350.57,213.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: The CIFAR10-cls model used for CIFAR-10 image classification in this work</figDesc><graphic coords="8,103.00,187.49,397.00,103.94" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: The FMNIST-cls model used for Fashion-MNIST image classification in this work</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: CIFAR-10 classification accuracies for all AFs with the Adam optimizer and learning rate of 0.001-testing configurfation CAL, demonstrates an advantage of AWSoS AFs</figDesc><graphic coords="8,91.95,436.33,451.00,300.65" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 7 :Figure 8 :</head><label>78</label><figDesc>Figure 7: CIFAR-10 classification accuracies for all AFs with the SGD optimizer and learning rate of 0.03-testing configuration CSL, shows that there's no consistent advantage of AWSoS AFs with the SGD optimizer</figDesc><graphic coords="11,77.75,63.55,451.00,300.65" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 9 :Figure 10 :</head><label>910</label><figDesc>Figure 9: Combined accuracy ranks for all testing configurations using the SGD optimizerconfigurations CSL, CSH, FSL, FSH (lower is better), demonstrates poor performance of WSoS AFs with the SGD optimizer</figDesc><graphic coords="12,77.75,63.55,451.00,300.65" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 4 Activation functions compared in this work</head><label>4</label><figDesc></figDesc><table><row><cell>Category of AFs</cell><cell>AFs</cell><cell>Sets of parameter values/</cell></row><row><cell></cell><cell></cell><cell>initial parameter values</cell></row><row><cell>The proposed WSoS and</cell><cell>SiSoTanh, SiSoAtan, SiSoAsinh,</cell><cell>𝛼 = 1, 𝛽 = 1, 𝛾 = 1</cell></row><row><cell>AWSoS AFs</cell><cell>GeSoTanh, GeSoAtan, GeSoAsinh,</cell><cell>𝛼 = 1, 𝛽 = 10, 𝛾 = 2.6</cell></row><row><cell></cell><cell>ASiSoTanh, ASiSoAtan, ASiSoAsinh,</cell><cell></cell></row><row><cell></cell><cell>AGeSoTanh, AGeSoAtan, AGeSoAsinh</cell><cell></cell></row><row><cell></cell><cell>SiSoTanh, GeSoTanh</cell><cell>𝛼 = 1, 𝛽 = 1.5, 𝛾 = 3.64</cell></row><row><cell></cell><cell>SiSoAtan, GeSoAtan</cell><cell>𝛼 = 1, 𝛽 = 1.2, 𝛾 = 3.2</cell></row><row><cell>Shifted origin-aligned S-</cell><cell>SoTanh, ASoTanh</cell><cell>𝛼 = 1</cell></row><row><cell>shaped AFs, and their</cell><cell>SoAtan, ASoAtan</cell><cell>𝛼 = 1</cell></row><row><cell>adaptive variants</cell><cell>SoAsinh, ASoAsinh</cell><cell>𝛼 = 1</cell></row><row><cell>Popular existing AFs</cell><cell>ReLU, ReLU-6, SiLU, GELU, ELU, Softsign,</cell><cell>N/A</cell></row><row><cell></cell><cell>Sigmoid, Tanh, Arctan, Asinh</cell><cell></cell></row><row><cell></cell><cell>Leaky ReLU</cell><cell>0.01</cell></row><row><cell></cell><cell></cell><cell>0.1</cell></row><row><cell></cell><cell></cell><cell>0.3</cell></row><row><cell></cell><cell></cell><cell>0.6</cell></row><row><cell></cell><cell cols="2">PReLU (with per-neuron trainable param.) 0</cell></row><row><cell></cell><cell>PReLU</cell><cell>0.01</cell></row><row><cell></cell><cell>(with trainable parameter shared across</cell><cell>0.2</cell></row><row><cell></cell><cell>the whole model)</cell><cell>0.4</cell></row><row><cell></cell><cell>Swish</cell><cell>0.33</cell></row><row><cell></cell><cell>(with trainable parameter shared across</cell><cell>1</cell></row><row><cell></cell><cell>the whole model)</cell><cell>3</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 5</head><label>5</label><figDesc>In all cases, CNN kernel weights are initialized with the Glorot uniform weight initialization method, and biases are initialized with zeros.</figDesc><table><row><cell cols="2">AF testing configurations</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Configuration</cell><cell>Dataset</cell><cell>Model</cell><cell cols="2">Optimizer Learning</cell><cell>Batch</cell><cell>No of</cell><cell>No of</cell></row><row><cell>Name</cell><cell></cell><cell></cell><cell></cell><cell>Rate</cell><cell>Size</cell><cell>epochs</cell><cell>runs</cell></row><row><cell>CAL</cell><cell>CIFAR-10</cell><cell cols="2">CIFAR10-cls Adam</cell><cell>0.001</cell><cell>32</cell><cell>30</cell><cell>10</cell></row><row><cell>CAH</cell><cell>CIFAR-10</cell><cell cols="2">CIFAR10-cls Adam</cell><cell>0.002</cell><cell>32</cell><cell>30</cell><cell>3</cell></row><row><cell>CSL</cell><cell>CIFAR-10</cell><cell cols="2">CIFAR10-cls SGD</cell><cell>0.03</cell><cell>32</cell><cell>30</cell><cell>10</cell></row><row><cell>CSH</cell><cell>CIFAR-10</cell><cell cols="2">CIFAR10-cls SGD</cell><cell>0.06</cell><cell>32</cell><cell>30</cell><cell>3</cell></row><row><cell>FAL</cell><cell cols="2">Fashion-MNIST FMNIST-cls</cell><cell>Adam</cell><cell>0.001</cell><cell>32</cell><cell>30</cell><cell>10</cell></row><row><cell>FAH</cell><cell cols="2">Fashion-MNIST FMNIST-cls</cell><cell>Adam</cell><cell>0.01</cell><cell>32</cell><cell>30</cell><cell>3</cell></row><row><cell>FSL</cell><cell cols="2">Fashion-MNIST FMNIST-cls</cell><cell>SGD</cell><cell>0.03</cell><cell>32</cell><cell>30</cell><cell>10</cell></row><row><cell>FSH</cell><cell cols="2">Fashion-MNIST FMNIST-cls</cell><cell>SGD</cell><cell>0.3</cell><cell>32</cell><cell>30</cell><cell>3</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 6 Image classification accuracies for the proposed WSoS and AWSoS AFs along with std. deviations</head><label>6</label><figDesc></figDesc><table><row><cell>,</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 7 Image classification accuracies for existing AFs along with std. deviations</head><label>7</label><figDesc></figDesc><table><row><cell>, %</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Rectifier Nonlinearities Improve Neural Network Acoustic Models</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Maas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Hannun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 30th International Conference on Machine Learning</title>
				<meeting>the 30th International Conference on Machine Learning</meeting>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page">3</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning</title>
		<author>
			<persName><forename type="first">S</forename><surname>Elfwinga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Uchibea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Doyab</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.neunet.2017.12.012</idno>
	</analytic>
	<monogr>
		<title level="j">Neural Networks</title>
		<imprint>
			<biblScope unit="volume">107</biblScope>
			<biblScope unit="page" from="3" to="11" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Hendrycks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpe</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1606.08415</idno>
		<title level="m">Gaussian Error Linear Units (GELUs)</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)</title>
		<author>
			<persName><forename type="first">D</forename><surname>Clevert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Unterthiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hochreiter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Deep Sparse Rectifier Neural Networks</title>
		<author>
			<persName><forename type="first">X</forename><surname>Glorot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Artificial Intelligence and Statistics</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks</title>
		<author>
			<persName><forename type="first">T</forename><surname>Szandała</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-15-5495-7</idno>
	</analytic>
	<monogr>
		<title level="m">Bio-inspired Neurocomputing, Lectures on Embedded Systems</title>
				<editor>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Bhoi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Mallick</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><forename type="middle">E</forename><surname>Balas</surname></persName>
		</editor>
		<meeting><address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Activation functions in deep learning: A comprehensive survey and benchmark</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Dubey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">B</forename><surname>Chaudhuri</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.neucom.2022.06.111</idno>
		<idno>doi:j.neucom.2022.06.111</idno>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">503</biblScope>
			<biblScope unit="page" from="92" to="108" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Liew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Khalil-Hani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bakhteri</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.neucom.2016.08.037</idno>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">216</biblScope>
			<biblScope unit="page" from="718" to="734" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Convolutional Deep Belief Networks on CIFAR</title>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">10</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Nicolae</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1809.09534</idno>
		<title level="m">PLU: The Piecewise Linear Unit Activation Function</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Tanh Works Better With Asymmetry</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kim3</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NIPS &apos;23: Proceedings of the 37th International Conference on Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">549</biblScope>
			<biblScope unit="page" from="12536" to="12554" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Robust Learning Algorithm for Networks of Neuro-Fuzzy Units</title>
		<author>
			<persName><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bodyanskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Popov</surname></persName>
		</author>
		<author>
			<persName><surname>Titov</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-90-481-3658-2_59</idno>
	</analytic>
	<monogr>
		<title level="m">Innovations and Advances in Computer Sciences and Engineering</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Sobh</surname></persName>
		</editor>
		<meeting><address><addrLine>Dordrecht</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Feedforward neural network with a specialized architecture for estimation of the temperature influence on the electric load</title>
		<author>
			<persName><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bodyanskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Popov</surname></persName>
		</author>
		<author>
			<persName><surname>Rybalchenko</surname></persName>
		</author>
		<idno type="DOI">10.1109/IS.2008.4670444</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. 2008 4th International IEEE Conference Intelligent Systems</title>
				<meeting>2008 4th International IEEE Conference Intelligent Systems<address><addrLine>Varna, Bulgaria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="7" to="14" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
