<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Comparison of Machine Learning approaches for Stress Detection from Wearable Sensors Data</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Michela</forename><surname>Quadrini</surname></persName>
							<email>michela.quadrini@unicam.it</email>
							<affiliation key="aff0">
								<orgName type="department">School of Science and Technology</orgName>
								<orgName type="institution">University of Camerino</orgName>
								<address>
									<addrLine>Via Madonna delle Carceri, 9</addrLine>
									<postCode>62032</postCode>
									<settlement>Camerino</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Denise</forename><surname>Falcone</surname></persName>
							<email>denise.facone@studenti.unicam.it</email>
							<affiliation key="aff0">
								<orgName type="department">School of Science and Technology</orgName>
								<orgName type="institution">University of Camerino</orgName>
								<address>
									<addrLine>Via Madonna delle Carceri, 9</addrLine>
									<postCode>62032</postCode>
									<settlement>Camerino</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Gianluca</forename><surname>Gerard</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Sorint.Tek</orgName>
								<address>
									<addrLine>17 Zanica Grassobbio</addrLine>
									<postCode>BG, 24050</postCode>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Comparison of Machine Learning approaches for Stress Detection from Wearable Sensors Data</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">2530B271C67696EFDFE379B7E177E720</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:56+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Physiological Signals</term>
					<term>Binary and multi-class classification</term>
					<term>Wearable Sensor Data</term>
					<term>time series</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Stress is a prevalent and growing phenomenon in the modern world potentially leading to significant repercussions on both physical and mental health. The analysis of physiological signals, collected from wearable sensors, has emerged as a promising approach to predicting and managing stress. Methods based on machine learning techniques have been defined in the literature and achieved promising results by using handcrafted features extracted from the signal. However, there is no consensus on the list of features, while deep learning approaches that overcomes the problem require significant computational power and a large amount of data. In this paper, we present a comprehensive view of the most common representative machine learning algorithms applied to the stress detection domain by giving a reference point for both academia and industry professionals in this application field. This study considers fragments of signals without extracting any features and uses a public dataset, WESAD, that contains high-resolution physiological, including blood volume pulse, electrocardiogram and electromyogram. The data collected from 15 subjects during a lab study are heterogeneous and characterized by different frequencies and noises due to some devices. After preprocessing, we assess the performance of ten machine learning algorithms belonging to four models (tree, ensemble, linear and neighbours) on the WESAD by facing the problem as binary (stress/no-stress) and multiclass (baseline, stress, and amusement) classifications. Our results, evaluated in terms of classical metrics, show that Random Forest outperforms the others in binary and multi-class approaches.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Stress is a non-specific body reaction to any demand upon it. Its effects influence overall behaviour, well-being, and potential personal and professional successes <ref type="bibr" target="#b0">[1]</ref>. Chronic stress may give rise to significant physical and mental health issues, such as cancer, cardiovascular disease, depression, and diabetes. It is an increasingly prevalent and pervasive phenomenon in the modern world: more than 50% of all work-related ill health cases in 2020/21 are due to stress <ref type="bibr" target="#b1">[2]</ref>. Assessments based on psychologically designed questions, such as the Perceived Stress Scale (PSS) <ref type="bibr" target="#b2">[3]</ref>, are frequently used to detect stress. However, these methods may be time-consuming, psychologically invasive and lack reliability. Therefore, the definition of non-invasive approaches for rapid and accurate stress detection influences the quality and wellness of people's lives: managing stress before it causes health issues is fundamental. In the literature, it has been demonstrated that physiological signals, a response to the Au-tonomic Nervous System, allow us to detect and monitor stress. Hovsepian et al. <ref type="bibr" target="#b3">[4]</ref> pioneered the stress detection by using physiological signals. Both faced the problem as a binary classification problem, whereas Gjoreski et al. <ref type="bibr" target="#b4">[5]</ref> aimed at distinguishing different levels of stress (no stress versus low stress versus high stress). Such bioignals can be captured non-invasively by wearable devices, such as smartphones and smartwatches, commonly used among people. Such devices can monitor some physiological parameters, such as Blood Volume Pulse (BVP), Electrodermal Activity (EDA), temperature (TEMP), and heart rate (HR) etc. In the scenario of stress detection, machine learning and deep learning methodologies achieve promising results by analyzing these data. These approaches include support vector machines, random forest and k-nearest neighbours and use handcrafted features extracted from the pre-processed signal in order to reduce the data noises <ref type="bibr" target="#b5">[6]</ref>. Moreover, no consensus on the list of features to extract from physiological data has been reached <ref type="bibr" target="#b6">[7]</ref>. To solve the problem, advanced deep learning approaches have been applied since they have the ability to automatically comprehend patterns and, thus extract features. Nevertheless, these require significant computational power and a large amount of data. The appropriate machine learning algorithm choice for a particular problem task is not trivial: no single classifier works best across all possible scenarios, as stated by no free lunch theorem states <ref type="bibr" target="#b7">[8]</ref>. To the best of our knowledge, no scientific work compares machine learn-ing methods for stress detection on the same datasets without feature extraction or dimensionality reduction.</p><p>In this paper, we present a comprehensive view of the most common representative machine learning algorithms applied to the stress detection domain by giving a reference point for both academia and industry professionals in this application field. In the analysis, we consider fragments of signals without extracting any features due to the nature of the problem: stress determines nonspecific human responses and the feature selection depends on the subject and do not can be generalized. Such signal fragments contain samples of all the physiological parameters measured. After appropriate resampling and noise reduction, these values are linearized and constitute the input of the considered ML model by following the neural network approach. This study uses the WESAD <ref type="bibr" target="#b8">[9]</ref> dataset that is public and stores 12 physiological signals, such as blood volume pulse and electrocardiogram, collected from 15 subjects during a lab study. After preprocessing (consisting of resampling, outlier removal, and normalization), we determine a dataset of samples that are signal fragments obtained using the sliding window approach. Over these entries, we evaluate the most common and popular methods widely in various application areas. We consider eight machine learning algorithms, i.e, Decision Tree (DT), Random Forest (RF), Adaboost (AB), Extratree (ExT), Passive Aggressive Classifier (PA), Logistic Regression (LR), K-kneighbors (NKE) and Nearest Centrod (NC). We face the binary (stress/nostress) and multi-class (baseline, stress, and amusement) problem classifications. The results, evaluated in terms of classical metrics, show that RF outperforms the others in binary and multi-class approach. We also compare the results obtained with the ones in the literature <ref type="bibr" target="#b8">[9]</ref>.</p><p>The paper is organized as follows. Section 2 describes the materials and the methods used in this study. The pipeline of the approach used in the study with the main results are described in Section 3. The paper ends with some conclusion and future work, Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">MATERIALS AND METHODS</head><p>This work proposes a comparative evaluation of ML approaches to understand the best approach for real-time analytics. For this study, we consider the WESAD dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Dataset</head><p>WESAD is a public dataset designed for stress and affective detection. It is a high-quality multimodal dataset storing physiological and movement data of 15 subjects (12 male and 3 female) during a controlled lab experiment <ref type="bibr" target="#b8">[9]</ref>. All the participants were not heavy smokers and did not suffer from chronic mental or cardiovascu-</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 1:</head><p>The two protocol versions used to collect data lar disorders. Furthermore, the females subjects were not pregnant. The dataset includes blood volume pulse (BVP), electrocardiogram (ECG), electrodermal activity (EDA), electromyogram (EMG), respiration (RESP), body temperature (TEMP), and three-axis acceleration (ACC). ECG, EDA, EMG, RESP, TEMP and ACC were recorded by a chest-worn device (RespiBan) and sampled at 700 Hz, whereas a wrist-worn device (Empatica E4) recorded BVP (sampled at 64 Hz), EDA (at 4 Hz), TEMP (at 4 Hz), and ACC (at 32 Hz). The dataset comprises 14 time series, each spanning approximately 2 hours, total experimental duration. The experiments were conducted to capture three distinct affective states: baseline, stress, and amusement with durations of 20 minutes, 392 seconds and 7 minutes, respectively. They also included two meditation periods. To capture the data during the experiment, a particular protocol, depicted in Figure <ref type="figure">1</ref>, has been used. It consists of two different versions, where amusement and stressful conditions are interchanged between different subjects to avoid the effects of order.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Preprocessing</head><p>The varied sampling frequencies in WESAD, as detailed in Section 2.1, necessitated a harmonization step. We resampled all data to match the 700Hz frequency of the RespiBAN. Therefore, the resampling is applied only to the time series detected by Empatica E4 using Fourier method as an unsampling technique.</p><p>After the resampling, we remove the outliers due to occasional anomalous peaks in some signals, which may be attributed to instrumental errors or measurement noise. We removed the anomalies from each time series by using a Hampel filter, discussed in <ref type="bibr" target="#b9">[10]</ref>. Such a filter uses 1-minute sliding windows as input and calculates the mean (𝜇) and standard deviation (𝜎) of the values within the corresponding interval. Observations higher than the threshold of 3𝜎 from the mean within the respective window are classified as outliers (following Pearson's rule) and are substituted with the nearest chronological value. This strategy ensures that outlier substitution doesn't introduce significant high-frequency variations.</p><p>After outliers removal, we normalize all signals in the interval [−1, 1] to treat all inputs equally.Let 𝑋 = {𝑥1, 𝑥2, . . . , 𝑥𝑛} be the considered time series with 𝑛 components, where each component corresponds to a biophysical signal. Each of them are rescaled to the in-  where 𝑚𝑎𝑥(𝑋) and 𝑚𝑖𝑛(𝑋) is the maximum and minimum value among each component of 𝑋, respectively. Therefore, the input is a the scaled time series, 𝑋 ˜= {𝑥 ˜1, 𝑥 ˜2, . . . , 𝑥 ˜𝑛}.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Dataset Entry</head><p>After the data preprocessing phase, we create two datasets: one for binary classification and the other for multiclass. All entries are obtained by applying the sliding window technique to preprocessed signals. Specifically, the entries consist of time series fragments characterized by only an emotional state (or label) obtained by a slide of 60 seconds and a stride of 30 seconds, according to the study in <ref type="bibr" target="#b10">[11]</ref>. To create the multiclass dataset, we consider parts of the time series associated with stress, Baseline and Amusement, as described in Section 2.1. For the binary classification, both the Baseline and Amusement states were aggregated under a single 'non-stress' label. The labels distribution of the two datasets are shown in Fig. <ref type="figure" target="#fig_0">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Machine Learning Algorithms</head><p>In this section, we describe some machine learning classification techniques. Interested readers can refer to <ref type="bibr" target="#b11">[12]</ref> for a complete treatment of machine learning approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.1.">Decision Tree</head><p>A DT is a non-parametric supervised learning algorithm for classification and regression in the form of a tree structure <ref type="bibr" target="#b12">[13]</ref>. It predicts the value of a target variable by learning simple decision rules inferred from the data features. The method exploits the "divide et impera" approach to learning: it learns from data with a set of if-then-else decision rules. The depth directly correlates with the complexity of these decision rules. The output is a tree comprising decision nodes and leaf nodes: a decision node has two or more branches, and a leaf node represents a classification or decision. The root of the tree corresponds to the best predictor. Usually, a DT is pruned by combining the adjacent nodes to avoid overfitting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.2.">Ensemble models</head><p>Ensemble learning is a kind of model that makes predictions considering and combining a number of different models. By such a combination, an ensemble learning tends to be more flexible and less data sensitive. AdaBoost AdaBoost, Adaptive Boosting, is an ensemble models developed by Yoav Freund et al. <ref type="bibr" target="#b14">[15]</ref>. It employs an iterative approach to improve poor classifiers by learning from their errors. Unlike the random forest that uses parallel ensembling, Adaboost uses "sequential ensembling". Therefore, it is not possible to parallelize jobs on a multiprocessor machine like Random Forest. It creates a classifier by combining many poorly performing classifiers to obtain a good classifier of high accuracy. Such resulting classifier is accomplished with sequential weight adjustments, individual voting powers and a weighted sum of the final algorithm classifiers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Random Forest</head><p>Extremely Randomized Trees Extremely Randomized Trees, introduced in <ref type="bibr" target="#b15">[16]</ref>, are ensembling methods that perform regression or classification. It creates a large number of unpruned decision trees from the training dataset and uses majority voting to select the decision trees for the classification. Different from Random Forest, it uses the entire dataset to train decision trees. Moreover, it randomly selects the values at which to split a feature and create child nodes to ensure sufficient differences between individual decision trees.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.3.">Linear Models</head><p>Logistic Regression Logistic Regression, introduced in <ref type="bibr" target="#b16">[17]</ref>, is a supervised learning algorithm mainly used for classification tasks where the aim is to estimate the probability of an instance belonging to a specific class based on the values of the input features. The method uses the sigmoid function to map any real-valued number into a value between 0 and 1. More specifically, it calculates a weighted sum of the input features, applies the logistic function to this sum, and then classifies the input as belonging to one of the two classes based on a chosen threshold.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Passive Aggressive</head><p>The passive-aggressive algorithm, introduced in <ref type="bibr" target="#b17">[18]</ref>, is one of the few "online learning algorithms": the input data comes in sequential order, and the model is updated step-by-step. It is useful in applications that receive data as a continuous flow and need to adapt to change rapidly or autonomously or if you have limited computing resources. The algorithm is based on based on Passive and Aggressive approches.</p><p>If the prediction is correct, keep the model and do not make any changes (passive), while If the prediction is incorrect, make changes to the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.4.">Neighbors-based Models</head><p>Supervised neighbors-based models can be applied for classification and regression. The principle behind nearest neighbor methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>K-Nearest Neighbors</head><p>The k-nearest neighbours algorithm, introduced by Fix and Hodges in 1951 <ref type="bibr" target="#b18">[19]</ref> and expanded by <ref type="bibr" target="#b19">[20]</ref>, is a non-parametric supervised learning method for classification and regression. K-nearest neighbours algorithm exploits proximity to make classifications or predictions about the grouping of an individual data point. KNN searches for the k-nearest labelled training data by using the distance metric and attributes the label which appears the most to the new observation. In our study, we use the Minkowski distance as a metric. The algorithm assumes that the centroids are distinct for each class (target label). The training data is divided into clusters based on their class labels, and then the centroid is computed for each data cluster. Each centroid is simply the mean value of each of the input variables. Such a centroid represents the "model": given new examples, the algorithm assigns the label by computing the distance between a given data and each centroid.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.">Metrics</head><p>We evaluate the performance and effectiveness of the approaches by using Accuracy (𝐴𝑐𝑐), Precision (𝑃 ), Recall (𝑅), and F-measure (𝐹1), defined as follows</p><formula xml:id="formula_0">𝐴𝑐𝑐 = 𝑇 𝑃 + 𝑇 𝑁 𝑇 𝑃 + 𝑇 𝑁 + 𝐹 𝑃 + 𝐹 𝑁 𝑃 = 𝑇 𝑃 𝑇 𝑃 + 𝐹 𝑃 𝑅 = 𝑇 𝑃 𝑇 𝑃 + 𝐹 𝑁 𝐹1 = 2 • 𝑃 • 𝑅 𝑃 + 𝑅</formula><p>where 𝑇 𝑃 represents the number of true positive, 𝐹 𝑁 denotes the number of false negative, 𝐹 𝑃 represents the number of false positive, 𝑇 𝑁 denotes the number of true negative.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">RESULTS</head><p>The work aims to compare various machine learning algorithms to detect stress from signals captured by wearable devices. The workflow is described in Section 3.1, while the results of the experiments are described in Section 3.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Methodology</head><p>Our pipeline, depicted in Fig. <ref type="figure" target="#fig_3">3</ref>, is implemented in Python using the scikit-learn package for the machine learning approaches and SciPy for data manipulation and analysis. In particular, some methods of the SciPy library is used in the data preprocessing phase. The method resample permits the resampling of signals. In our approach, all signals are resampled at 700 Hz. About the outlier remotion, the Hampel filter is implemented using the 'rolling', 'mean', 'std', 'fillna', 'mask', and 'interpolate' methods from the Pandas library. The 'MinMaxScaler' class of the scikit-learn package is used to perform data normalization. The machine learning methods Decision Tree, Random Forest , K-Nearest Neighbors and Logistic Regression are implemented via the tree, ensemble, neighbors and linear model modules, respectively. The method K-Folds is used to split the dataset into 𝑘 consecutive folds without shuffling and then each fold is </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Experiments</head><p>Given the small number of subjects involved in the experiment, we consider the Leave-One-Subject-Out Cross-Validation (LOSOCV), i.e., an approach that utilizes each subject as a "test" set and the remaining 14 as a "training" set. The experiments have been performed considering the decision tree, random forest, K-Nearest Neighbors and logistic regress as machine learning methods. For all experiments, we use the default parameters.</p><p>We evaluate such experiments by considering Accuracy, Precision, Recall and F1-Score as metrics. Tables <ref type="table">1</ref> shows the average values with the standard deviation of the considered metrics obtained for binary and multiclass classification, respectively. Appendix A reports the values for each experiment. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Average value with metrics with their standard deviation related to the binary and multiclass classification</p><p>The Random Forest model outpaces its counterparts in both binary and multiclass classification scenarios. For the RF model, the obtained accuracy stands at 92% (binary) and 70% (multiclass). Corresponding F1-scores are 88.2% and 60% , respectively. While multiclass classification offers insights for emotion detection via wearables, there remains room for improvement. Comparing results from Schmidt et al.'s benchmark on the WESAD dataset <ref type="bibr" target="#b8">[9]</ref>, which utilized standardized machine learning techniques and features, our study finds that the RF </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Average value with metrics with their standard deviation related to the binary and multiclass classification by extraction features from signals <ref type="bibr" target="#b8">[9]</ref> algorithm delivers superior performance. The accuracy and F1-score is reported in Table <ref type="table">2</ref>.</p><p>Comparing the results, we note that the methods performs better using signal values than signal features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">CONCLUSIONS AND FUTURE WORK</head><p>In this work, we have compared various classical machine learning algorithms. We have used a public dataset, WESAD, to perform our study. Analyzing the results, we have noted the best results have been archived by the random forest algorithm. This evidence is in line with the results proposed in the literature <ref type="bibr" target="#b8">[9]</ref>. We have observed that classifications based on the signal values outcome ones that consider signal features.</p><p>In future work, we intend to conduct additional experiments to discern the most relevant physiological signals. It represents another fundamental aspect of detecting stress for real-time analysis using wearable sensors and smartphones. In this case, the aim is to store the minimum information to be non-invasive and reduce the space while maintaining high model performance. We also intend to consider and employ deep learning approaches, such as graph convolution networks or recurrent neural networks, motivated by the results obtained in other scenarios <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23]</ref>. Moreover, we also intend to study the role of the length of the sliding windows from a theoretical perspective by taking into account various entropy-based methods that have produced evaluable outcomes in the scenario of protein-protein interaction site prediction <ref type="bibr" target="#b23">[24]</ref>. Another crucial future investigation is to explore and define approaches to extract and describe the correlation that sliding windows represent. Other representations, like arc-annotated sequences, strings and simplicial complexes, will be explored. We will explore other representations like arc-annotated sequences for the analysis and comparison of time utilizing tools like <ref type="bibr" target="#b24">[25]</ref> and strings or simplicial complexes, which allow applying techniques from formal methods to identify patterns <ref type="bibr" target="#b25">[26]</ref> or verify properties <ref type="bibr" target="#b26">[27]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Label distributions of datasets created for multiclass and binary classification.</figDesc><graphic coords="3,89.30,84.19,192.00,72.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>terval [−1, 1] by applying the mean normalization: 𝑥 ˜𝑗 = (𝑥 𝑖 − 𝑚𝑎𝑥(𝑋)) + ((𝑥 𝑖 − 𝑚𝑖𝑛(𝑋)) 𝑚𝑎𝑥(𝑋) − 𝑚𝑖𝑛(𝑋)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>Random Forest is an ensemble model by Breiman<ref type="bibr" target="#b13">[14]</ref> for both classification and regression. It constructs a set of decision trees during training and determines the prediction by selecting the most common class in the classification problem or calculating the mean/average prediction in the regression problem of the classes output by individual trees. This model combines the bagging approach with the random selection of features to ensure the uncorrelation among the decision trees of the forest. Feature randomness generates a random subset of features by ensuring low correlation among decision trees. In bagging, the decision trees depend on trees created from a different bootstrap sample, i.e., samples that may appear more than once in the entries of the training dataset. Differently from decision trees that consider all the possible feature splits, random forests only select a subset of those features.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Pipeline used for the method comparison</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>The input consists of the k closest training examples in a data set, whereas the output depends on the task, classification or regression. Such output is a class membership or the property value for the entry, respectively.</figDesc><table><row><cell>Nearest Centroid Nearest Centroids, defined in [21],</cell></row><row><cell>is arguably the simplest classifier. It operates on an intu-</cell></row><row><cell>itive principle: it takes data samples as input and classifies</cell></row><row><cell>them into the class of training examples whose centroid</cell></row><row><cell>(a geometric centre of a data distribution) is closest to it.</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgements. This work has been funded by the European Union -NextGenerationEU under the Italian Ministry of University and Research (MUR) National Innovation Ecosystem grant ECS00000041 -VITALITY -CUP J13C22000430001</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Protective and damaging effects of stress mediators</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">S</forename><surname>Mcewen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">New England journal of medicine</title>
		<imprint>
			<biblScope unit="volume">338</biblScope>
			<biblScope unit="page" from="171" to="179" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">HSE on work-related stress</title>
		<ptr target="http://www.hse.gov.uk/statistics/causdis/-ffstress/index.htm,???" />
		<imprint>
			<date type="published" when="2021-03-07">2021. March 7, 2022</date>
		</imprint>
		<respStmt>
			<orgName>Health and Safety Executive</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Review of the psychometric evidence of the perceived stress scale</title>
		<author>
			<persName><forename type="first">E.-H</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Asian nursing research</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="121" to="127" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">cstress: towards a gold standard for continuous stress assessment in the mobile environment</title>
		<author>
			<persName><forename type="first">K</forename><surname>Hovsepian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Al'absi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ertin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kamarck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nakajima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing</title>
				<meeting>the 2015 ACM international joint conference on pervasive and ubiquitous computing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="493" to="504" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Monitoring stress with a wrist device using context</title>
		<author>
			<persName><forename type="first">M</forename><surname>Gjoreski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Luštrek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gjoreski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of biomedical informatics</title>
		<imprint>
			<biblScope unit="volume">73</biblScope>
			<biblScope unit="page" from="159" to="170" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A review of emotion recognition using physiological signals</title>
		<author>
			<persName><forename type="first">L</forename><surname>Shu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Sensors</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page">2074</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Stress detection using deep neural networks</title>
		<author>
			<persName><forename type="first">R</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Medical Informatics and Decision Making</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="1" to="10" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The lack of a priori distinctions between learning algorithms</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">H</forename><surname>Wolpert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural computation</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="1341" to="1390" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Introducing wesad, a multimodal dataset for wearable stress and affect detection</title>
		<author>
			<persName><forename type="first">P</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Reiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Duerichen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Marberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Van Laerhoven</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th ACM international conference on multimodal interaction</title>
				<meeting>the 20th ACM international conference on multimodal interaction</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="400" to="408" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Astola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kuosmanen</surname></persName>
		</author>
		<title level="m">Fundamentals of nonlinear digital filtering</title>
				<imprint>
			<publisher>CRC press</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Stress detection from wearable sensor data using gramian angular fields and cnn</title>
		<author>
			<persName><forename type="first">M</forename><surname>Quadrini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Daberdaku</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Blanda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Capuccio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bellanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gerard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Discovery Science</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="173" to="183" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Shalev-Shwartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ben-David</surname></persName>
		</author>
		<title level="m">Understanding machine learning: From theory to algorithms</title>
				<imprint>
			<publisher>Cambridge university press</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Induction of decision trees</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Quinlan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="81" to="106" />
			<date type="published" when="1986">1986</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Random forests</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="5" to="32" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Experiments with a new boosting algorithm</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Freund</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Schapire</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">icml</title>
		<imprint>
			<biblScope unit="volume">96</biblScope>
			<biblScope unit="page" from="148" to="156" />
			<date type="published" when="1996">1996</date>
			<publisher>Citeseer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Extremely randomized trees</title>
		<author>
			<persName><forename type="first">P</forename><surname>Geurts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ernst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wehenkel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<biblScope unit="page" from="3" to="42" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">The regression analysis of binary sequences</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Cox</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the Royal Statistical Society Series B: Statistical Methodology</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="215" to="232" />
			<date type="published" when="1958">1958</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Crammer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Dekel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Keshet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shalev-Shwartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Singer</surname></persName>
		</author>
		<title level="m">Online passive aggressive algorithms</title>
				<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Discriminatory analysis. nonparametric discrimination: Consistency properties</title>
		<author>
			<persName><forename type="first">E</forename><surname>Fix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Hodges</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Statistical Review/Revue Internationale de Statistique</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="238" to="247" />
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Nearest neighbor pattern classification</title>
		<author>
			<persName><forename type="first">T</forename><surname>Cover</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on information theory</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="21" to="27" />
			<date type="published" when="1967">1967</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Diagnosis of multiple cancer types by shrunken centroids of gene expression</title>
		<author>
			<persName><forename type="first">R</forename><surname>Tibshirani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hastie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Narasimhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the National Academy of Sciences</title>
		<imprint>
			<biblScope unit="volume">99</biblScope>
			<biblScope unit="page" from="6567" to="6572" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Hierarchical representation and graph convolutional networks for the prediction of protein-protein interaction sites</title>
		<author>
			<persName><forename type="first">M</forename><surname>Quadrini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Daberdaku</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ferrari</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine Learning, Optimization, and Data Science: 6th International Conference, LOD 2020</title>
		<title level="s">Revised Selected Papers, Part II</title>
		<meeting><address><addrLine>Siena, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">July 19-23, 2020. 2020</date>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="409" to="420" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Hierarchical representation for ppi sites prediction</title>
		<author>
			<persName><forename type="first">M</forename><surname>Quadrini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Daberdaku</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ferrari</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC bioinformatics</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page">96</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Prosps: protein sites prediction based on sequence fragments</title>
		<author>
			<persName><forename type="first">M</forename><surname>Quadrini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cavallin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Daberdaku</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ferrari</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning, Optimization, and Data Science</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="568" to="580" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Aspralign: a tool for the alignment of rna secondary structures with arbitrary pseudoknots</title>
		<author>
			<persName><forename type="first">M</forename><surname>Quadrini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Tesei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Merelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="3578" to="3579" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Loop grammars to identify rna structural patterns</title>
		<author>
			<persName><forename type="first">M</forename><surname>Quadrini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Merelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Piergallini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Bioinformatics</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="302" to="309" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">A spatial logic for simplicial models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Loreti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Quadrini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Logical Methods in Computer Science</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
