<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ALSFRS-R Score Prediction for Amyotrophic Lateral Sclerosis Notebook for the iDPP Lab on Intelligent Disease Progression Prediction at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Guido</forename><surname>Barducci</surname></persName>
							<email>guido.barducci@unito.it</email>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Medical Sciences</orgName>
								<orgName type="laboratory">Computational Biomedicine Unit</orgName>
								<orgName type="institution">University of Turin</orgName>
								<address>
									<settlement>Turin</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Flavio</forename><surname>Sartori</surname></persName>
							<email>flavio.sartori@unito.it</email>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Medical Sciences</orgName>
								<orgName type="laboratory">Computational Biomedicine Unit</orgName>
								<orgName type="institution">University of Turin</orgName>
								<address>
									<settlement>Turin</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giovanni</forename><surname>Birolo</surname></persName>
							<email>giovanni.birolo@unito.it</email>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Medical Sciences</orgName>
								<orgName type="laboratory">Computational Biomedicine Unit</orgName>
								<orgName type="institution">University of Turin</orgName>
								<address>
									<settlement>Turin</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tiziana</forename><surname>Sanavia</surname></persName>
							<email>tiziana.sanavia@unito.it</email>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Medical Sciences</orgName>
								<orgName type="laboratory">Computational Biomedicine Unit</orgName>
								<orgName type="institution">University of Turin</orgName>
								<address>
									<settlement>Turin</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Piero</forename><surname>Fariselli</surname></persName>
							<email>piero.fariselli@unito.it</email>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Medical Sciences</orgName>
								<orgName type="laboratory">Computational Biomedicine Unit</orgName>
								<orgName type="institution">University of Turin</orgName>
								<address>
									<settlement>Turin</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ALSFRS-R Score Prediction for Amyotrophic Lateral Sclerosis Notebook for the iDPP Lab on Intelligent Disease Progression Prediction at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">986F9A75F55AE7E45A30A68022C02EAB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:56+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Machine Learning, ALS, ALSFRS-R (P. Fariselli) 0009-0005-1052-8495 (G. Barducci)</term>
					<term>0009-0004-3833-6551 (F. Sartori)</term>
					<term>0000-0003-0160-9312 (G. Birolo)</term>
					<term>0000-0003-3288-0631 (T. Sanavia)</term>
					<term>0000-0003-1811-4762 (P. Fariselli)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disorder that results in the gradual deterioration of motor abilities, leading to challenges in breathing, speaking, swallowing, and ultimately death, typically occurring within a few years. The symptoms of ALS can vary significantly from one individual to another, affecting various bodily functions and areas. To assess this wide range of symptoms, the Amyotrophic Lateral Sclerosis Functional Rating Scale -Revised (ALSFRS-R) is utilized. Predicting the ALSFRS-R score is clinically relevant for personalizing patient monitoring. To address this need, the Intelligent Disease Progression Prediction challenge was organized, tasking participants with developing novel methods to predict these scores using non-invasive sensor data that monitor some individual characteristics. The competition included two tasks that differed only in the way the ALSFRS-R questionnaires were completed: either by medical staff (task 1) or by the patient (task 2). Given the limited number of patients on the training set, it was decided to use a relatively simple model, Random Forest, and to preselect sensor features by retaining those most correlated with the outcome to be predicted. We selected the model with the lowest MAE estimated by cross-validation on the challenge training set. The competition results demonstrate that our method attained on the test set an average Mean Absolute Error (MAE) of 0.234 and 0.311, along with a Root Mean Square Error (RMSE) of 0.519 and 0.601 for tasks 1 and 2, respectively. Although the error may appear very low, this is because questionnaire values tend to remain constant from one visit to another, thus facilitating prediction.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Amyotrophic lateral sclerosis (ALS), also known as neuropathy, is a rapidly progressive and ultimately fatal neurological disease that affects the neurons controlling voluntary muscles in the arms, legs, and face. The yearly incidence of ALS is around 1 to 2.6 cases per 100, 000 individuals, while the prevalence is approximately 6 cases per 100, 000. ALS belongs to a group of motor neuron disorders and typically results in death. Previous studies report approximately 48% and 24% survival rates at 3 and 5 years respectively, with around 4% surviving beyond 10 years. However, population-based studies show lower 5-year survival rates, ranging from 4% to 30% <ref type="bibr" target="#b0">[1]</ref>  <ref type="bibr" target="#b1">[2]</ref>.</p><p>The symptoms of this disease can vary greatly from case to case and can affect different functions and areas of the body. To describe this wide range of symptoms, the Amyotrophic Lateral Sclerosis Functional Rating Scale -Revised (ALSFRS-R) is employed. It consists of a 12-item inventory, with each item rated on a 0-4 scale by patients and/or caregivers, resulting in a maximum score of 48 points. ALSFRS-R assesses patients' levels of self-sufficiency in areas including feeding, grooming, ambulation, and communication <ref type="bibr" target="#b0">[1]</ref>  <ref type="bibr" target="#b2">[3]</ref> .</p><p>Given the variability of this disease, monitoring checks should vary depending on its characteristics, such as the progression rate. Currently, there is no system capable of predicting the course of the disease, making it very challenging to personalize patient visits based on disease progression. The goal of this paper is to utilize information collected through the sensors of a commercial fitness smartwatch, past ALSFRS-R scores, and static features (such as age, sex, etc.) to predict future ALSFRS-R scores. The questionnaire data available has two different sources: they can be filled out by a doctor or by the patient through the use of a dedicated smartphone application. Therefore, the challenge has been divided into two tasks with the same goal but using data from different sources characterized by different frequencies of intervals between one questionnaire and the next, as well as differing medical or personal opinions, which may lead to different scoring choices despite similar symptoms. To solve these tasks, classical machine learning models were used instead of deep learning given the small number of patients in the training dataset. For more details we refer the reader to the challenge overview papers <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b5">5]</ref>.</p><p>The paper is divided into the following sections: 2 Related Work, which reports some papers addressing topics similar to this work; 3 Methodology, where the entire procedure that led to the predictions for the two tasks is outlined; 4 Experimental Setup, which details the procedures used; 5 Results, where the obtained results along with performance metrics are presented; and 6 Conclusions and Future Work, which reviews the essential steps of the paper and proposes alternative methodologies that could be useful for improving predictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The quest to identify prognostic factors and build predictive models for amyotrophic lateral sclerosis (ALS) progression has been a longstanding challenge, but one of paramount importance. ALS exhibits significant variability in its progression and outcomes, posing obstacles to making accurate predictions. Many methodologies have undergone rigorous testing using data from the PRO-ACT database. While this repository may not perfectly capture the full spectrum of ALS patients in the population, it stands as the largest publicly accessible dataset amalgamating ALS clinical trials <ref type="bibr" target="#b6">[6]</ref> [7] <ref type="bibr" target="#b8">[8]</ref>.</p><p>Wearable devices have been effectively used to study individuals with ALS, demonstrating a link between ALS progression and behavior and function patterns in people with amyotrophic lateral sclerosis, as measured by digital wearables <ref type="bibr" target="#b9">[9]</ref> [10] <ref type="bibr" target="#b11">[11]</ref>  <ref type="bibr" target="#b12">[12]</ref>. These measurements include total activity volume, active versus sedentary time, and time spent at home. Additionally, wearable devices are increasingly utilized to investigate physical activity in populations with cardiovascular disease, multiple sclerosis, arthritis, and other conditions. Although studies have been conducted to predict characteristics related to the ALSFRS score, such as its score and slope, to date, there are no predictors leveraging data from smartwatches to predict the ALSFRS score <ref type="bibr" target="#b13">[13]</ref>  <ref type="bibr" target="#b8">[8]</ref>. Hence, it is imperative to investigate such data types extensively to ascertain if they can enhance diagnostic predictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>Three different types of data were used to predict ALSFRS-R scores: sensor data from Garmin VivoActive 4 smartwatches, static feature data, and ALSFRS-R questionnaire data. Regarding sensor data, these consist of 90 different features per day and are characterized by a large number of missing values (in many days, no features were recorded, rendering the sensor vector absent for those days) due to both the data collection device and patient behavior. The static data are baseline characteristics recorded at a specific time, they include: sex, diagnostic delay, age at diagnosis, forced vital capacity (FVC), weight, and body mass index (BMI). Finally, the ALSFRS-R data are of the same type as those to be predicted but collected at a previous time.</p><p>To leverage these challenging data, two different approaches have been explored: the Mono Window approach and the Double Window approach. The Mono Window approach is the simpler of the two: for each prediction, only the sensors recorded within 7 days prior to the questionnaire to be predicted are used (these can be utilized in various ways, such as averaging or taking the median). The second approach involves considering two sensor data windows instead of one: the first window adjacent to the questionnaire to be predicted, and the second adjacent to the previous available questionnaire. The idea behind this second method is to provide the model with more information about the changes in recorded parameters over time. Despite the large amount of sensor data (13.946 feature vectors) and the fact that the second method seems more natural for handling this type of data, it is heavily penalized by the irregularity of the sensors. Indeed, two 7-day windows with at least 3 days of sensor data were not available in 20 out of 54 patients. For this reason, the choice to use the first approach was mandatory. In this process, for each patient, all sensor data outside the temporal window of the 7 days preceding the ALSFRS-R values to be predicted was disregarded. The sensor data within each time window were averaged along the time to obtain one feature vector of length 90 for each window that is less affected by daily variability. Once the feature vectors representing the sensor data were obtained, they were concatenated with the vectors of static features and with the vectors of previous ALSFRS-R data before the questionnaires to be predicted; by doing this the the final feature vectors were obtained.</p><p>Regarding the outcomes to predict, these are the values of ALSFRS-R questionnaires after the last ones available for the training. To solve this task, it has been observed that the questionnaire values tend to remain constant between visits (see Figure <ref type="figure">1</ref>); therefore, it was decided to use the previous time's questionnaire score as the prediction baseline and to fit the model on the residuals. To obtain the final prediction value, it was sufficient to add the predicted residual to the value of the previous questionnaire relative to the one to be predicted <ref type="foot" target="#foot_0">1</ref> . The Random Forest Classifier was chosen as the model; unlike deep learning models, it does not require large datasets for training, making it suitable for this task. Before being used to fit the model, the data were preprocessed by scaling and performing feature selection, as explained in the section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Setup</head><p>The training dataset for this challenge comprises data from 52 different patients for both tasks. These data can be categorized into three types: static data, ALSFRS-R data, and data from a smartwatch sensor. By analyzing the correlation matrices, two important facts can be observed: as for the ALSFRS-R data, it can be observed that they can be divided into several correlated groups depending on the area affected by the disease (see Figure <ref type="figure">2</ref>). Meanwhile, regarding the sensor data, there are strongly correlated features, as shown in the figure <ref type="figure" target="#fig_2">3</ref>. As shown in the image, the correlated features are grouped into distinct blocks. For the cardiac features, two main blocks can be identified: the first, smaller block pertains to Heart Rate Variability (HRV), while the second block encompasses other cardiac characteristics related to the RR interval. For the respiratory features, two blocks are associated with respiration and blood oxygenation. Lastly, another block of correlated features pertains to the patient's steps. The grouping of features into these blocks is expected, as they represent different statistics describing the same physiological processes.</p><formula xml:id="formula_0">Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10<label>Q11</label></formula><p>The two tasks differ only in terms of the subject and the frequency with which the questionnaires are filled out. This makes the two tasks slightly different primarily for two reasons: the data from the first task should reasonably be more objective, as it is a clinician who fills out the questionnaires rather than the individual patient. Additionally, the data from the second task are compiled through an app and have therefore a higher and more irregular frequency, as depicted in Figure <ref type="figure" target="#fig_4">4</ref>.</p><p>Regarding data preprocessing, the features were initially scaled using Min-Max Scaler and imputed with the mean or mode depending on their continuous or categorical nature. Concerning the sensor data, an additional process was added: the correlation between each of these features and the questionnaire to be predicted was calculated, and only those features with a correlation above a certain threshold were retained. This approach was chosen due to the low number of samples available to train the model compared to the total number of features. After the data preprocessing, they were used to train a Random Forest Classifier. To determine the optimal hyperparameters of the model<ref type="foot" target="#foot_1">2</ref> , along with the correlation threshold utilized to select the sensor features 3 , the following cross-validation strategy was employed. The training set of the challenge was divided into an inner training set and a validation set (80-20%). Hyperparameter optimization was conducted via cross-validation on the inner training set using a grid search method 4 , and the chosen hyperparameters for each the tasks are displayed in Table <ref type="table" target="#tab_1">1</ref> and Table <ref type="table" target="#tab_2">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>The challenge was divided into two tasks, Task 1 and Task 2, each involving the construction of a model to predict the values of ALSFRS-R questionnaires completed by either a clinician or the patient using a dedicated smartphone application. Despite testing two different macro types of methods: Mono Window and Double Window, the low number of patients in the training set resulted in significantly lower performance for the second type. Therefore, we only submitted results from the first type. These  were obtained using a Random Forest Classifier, which was trained after determining the optimal hyperparameters through five-fold cross-validation; the average metrics for each ALSFRS-R question over the 5 validation folds are reported in Table <ref type="table" target="#tab_3">3</ref> for the first task and in Table <ref type="table" target="#tab_4">4</ref> for the second one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions and Future Work</head><p>The models attempted to predict the ALSFRS-R questionnaire values were constrained by the small size of the training dataset. Despite experimenting with models utilizing various time intervals, the only ones proving useful for prediction were those solely relying on a window of sensor data adjacent to the questionnaire to be predicted, without leveraging information from more distant times. This limitation stems from the challenging nature of the data, which contains a large number of missing values. Among the models tested, the one demonstrating the best performance and subsequently used for submission was based on Random Forest, preceded by a feature selection step to reduce the number of sensor features.  The performance obtained shows significant variability depending on the questionnaire number; The Mean Absolute Error (MAE) calculated on the test set is 0.23 and 0.31 respectively for Task 1 and Task 2, while the Root Mean Squared Error (RMSE) is 0.52 and 0.60 respectively. The lower error is observed in the first task, which could be attributed to the fact that the questionnaire compilation by clinical staff tends to be more reliable and objective compared to the subjective opinion from the patient. These seemingly promising results are unfortunately attributed to the ALSFRS-R questionnaires mostly remaining constant from one visit to another, making it very easy to achieve high prediction performance.</p><p>To address this issue, one potential approach to improve is using data augmentation to increase the number of questionnaires in the training set. To improve predictions, methods of deep learning could be tested, leveraging much longer sequences of sensor data (such as recurrent neural networks). However,  </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>2 Figure 1 :</head><label>21</label><figDesc>Figure 1: Frequencies of residue values in the training ALSFRS-R data for tasks 1 and 2. As you can see, the most frequent value is 0 for each task. Consequently, the majority of questionnaire values remain unchanged compared to the previous ones.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>2 Figure 2 :</head><label>22</label><figDesc>Figure 2: Correlation matrix of ALSFRS-R values in the training data for both task 1 and task 2. They reveal that the scores can be clustered into distinct groups. This indicates significant correlations among certain ALSFRS-R items, suggesting underlying patterns or relationships within the data.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Sensors Correlation Matrix.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Here's the histogram showing the difference in days between consecutive questionnaires for task 1 and task 2. As you can see, for task 2, the questionnaires are filled out at more irregular intervals and with greater frequency compared to task 1</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>This table pertains to Task 1. It presents the hyperparameters associated with Random Forest along with the correlation threshold used for feature selection. This process involved removing sensor features with a correlation to the outcome lower than the specified threshold.</figDesc><table><row><cell>Q</cell><cell cols="3">depth max features correlation threshold</cell></row><row><cell>Q1</cell><cell>3</cell><cell>sqrt</cell><cell>0.2</cell></row><row><cell>Q2</cell><cell>2</cell><cell>sqrt</cell><cell>0.0</cell></row><row><cell>Q3</cell><cell>3</cell><cell>log2</cell><cell>0.0</cell></row><row><cell>Q4</cell><cell>8</cell><cell>sqrt</cell><cell>025</cell></row><row><cell>Q5</cell><cell>7</cell><cell>log2</cell><cell>0.15</cell></row><row><cell>Q6</cell><cell>6</cell><cell>sqrt</cell><cell>0.2</cell></row><row><cell>Q7</cell><cell>8</cell><cell>log2</cell><cell>0.1</cell></row><row><cell>Q8</cell><cell>9</cell><cell>sqrt</cell><cell>0.15</cell></row><row><cell>Q9</cell><cell>6</cell><cell>sqrt</cell><cell>0.15</cell></row><row><cell>Q10</cell><cell>2</cell><cell>log2</cell><cell>0.0</cell></row><row><cell>Q11</cell><cell>2</cell><cell>log2</cell><cell>0.0</cell></row><row><cell>Q12</cell><cell>3</cell><cell>log2</cell><cell>0.25</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>This table pertains to Task 2. It presents the hyperparameters associated with Random Forest along with the correlation threshold used for feature selection. This process involved removing sensor features with a correlation to the outcome lower than the specified threshold.</figDesc><table><row><cell>Q</cell><cell cols="3">depth max features correlation threshold</cell></row><row><cell>Q1</cell><cell>4</cell><cell>sqrt</cell><cell>0.25</cell></row><row><cell>Q2</cell><cell>5</cell><cell>log2</cell><cell>0.0</cell></row><row><cell>Q3</cell><cell>3</cell><cell>sqrt</cell><cell>0.0</cell></row><row><cell>Q4</cell><cell>4</cell><cell>log2</cell><cell>0.0</cell></row><row><cell>Q5</cell><cell>7</cell><cell>sqrt</cell><cell>0.05</cell></row><row><cell>Q6</cell><cell>9</cell><cell>log2</cell><cell>0.0</cell></row><row><cell>Q7</cell><cell>7</cell><cell>sqrt</cell><cell>0.0</cell></row><row><cell>Q8</cell><cell>9</cell><cell>log2</cell><cell>0.25</cell></row><row><cell>Q9</cell><cell>8</cell><cell>sqrt</cell><cell>0.25</cell></row><row><cell>Q10</cell><cell>8</cell><cell>log2</cell><cell>0.25</cell></row><row><cell>Q11</cell><cell>3</cell><cell>sqrt</cell><cell>0.0</cell></row><row><cell>Q12</cell><cell>4</cell><cell>sqrt</cell><cell>0.15</cell></row></table><note>these models require significantly more data for training, which represent the biggest obstacle for this task.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc>Performance metrics on the task one validation set. They have been calculated by taking the mean across the five folds for each Q.</figDesc><table><row><cell>Q</cell><cell>MAE (std)</cell><cell>RMSE (std)</cell></row><row><cell cols="3">Q1 0.155 (0.080) 0.385 (0.0971)</cell></row><row><cell cols="3">Q2 0.091 (0.019) 0.298 (0.0317)</cell></row><row><cell cols="3">Q3 0.159 (0.028) 0.397 (0.034)</cell></row><row><cell cols="3">Q4 0.266 (0.106) 0.508 (0.107)</cell></row><row><cell cols="3">Q5 0.337 (0.112) 0.575 (0.096)</cell></row><row><cell cols="3">Q6 0.370 (0.045) 0.658 (0.063)</cell></row><row><cell cols="3">Q7 0.280 (0.054) 0.528 (0.051)</cell></row><row><cell cols="3">Q8 0.229 (0.071) 0.473 (0.080)</cell></row><row><cell cols="3">Q9 0.319 (0.076) 0.640 (0.098)</cell></row><row><cell cols="3">Q10 0.165 (0.024) 0.397 (0.030)</cell></row><row><cell cols="3">Q11 0.000 (0.000) 0.000 (0.000)</cell></row><row><cell cols="3">Q12 0.128 (0.045) 0.500 (0.089)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc>Performance metrics on the task two validation set. They have been calculated by taking the mean across the five folds for each Q.</figDesc><table><row><cell>Q</cell><cell>MAE (std)</cell><cell>RMSE (std)</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The samples in the training set corresponding to a residual value occurring fewer than 8 times have been discarded; indeed, they are too few to be recognized by the model.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">The Random Forest hyperparameters tested were maxing deep and max features; they have been tested respectively in ranges<ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b9">9]</ref> and [sqrt, 'log2', None]. The number of trees was fixed at</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="300" xml:id="foot_2">.<ref type="bibr" target="#b2">3</ref> The thresholds tested has been [0, 0.05, 0.1, 0.15, 0.2, 0.25].<ref type="bibr" target="#b3">4</ref> During the fold creation process, care was taken to obtain stratified folds for outcomes to ensure partitions with similar percentages for each outcome category.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Amyotrophic lateral sclerosis</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Wijesekera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">Nigel</forename><surname>Leigh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Orphanet journal of rare diseases</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="1" to="22" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The epidemiology of amyotrophic lateral sclerosis</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">O</forename><surname>Talbott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Malek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lacomis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Handbook of clinical neurology</title>
		<imprint>
			<biblScope unit="volume">138</biblScope>
			<biblScope unit="page" from="225" to="238" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The alsfrs-r: a revised als functional rating scale that incorporates assessments of respiratory function</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Cedarbaum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Stambler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Malta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fuller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hilt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thurmond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nakanishi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">A S</forename><surname>Group</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the neurological sciences</title>
		<imprint>
			<biblScope unit="volume">169</biblScope>
			<biblScope unit="page" from="13" to="21" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
	<note>complete listing of the BDNF Study Group</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">G</forename><surname>Birolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bosoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Aidos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bergamaschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cavalla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chiò</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dagliati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Carvalho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Di Nunzio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fariselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>García Dominguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gromicho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Guazzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Longato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Madeira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Manera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Marchesin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Menotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Silvello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tavazzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tavazzi</surname></persName>
		</author>
		<idno>0.063</idno>
	</analytic>
	<monogr>
		<title level="j">Q1</title>
		<imprint>
			<biblScope unit="volume">0</biblScope>
			<biblScope unit="issue">0</biblScope>
			<biblScope unit="page">372</biblScope>
		</imprint>
	</monogr>
	<note>0.050) 0.586 (0.062) Q3 0.083 (0.016) 0.286 (0.029) Q4 0.197 (0.045) 0.442 (0.050) Q5 0.360 (0.059) 0.657 (0.040) Q6 0.324 (0.070) 0.572 (0.062) Q7 0.320 (0.056) 0.563 (0.051) Q8 0.229 (0.066) 0.474(0.071) Q9 0.226 (0.049) 0.473 (0.055</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of idpp@clef 2024: The intelligent disease progression prediction challenge</title>
		<author>
			<persName><forename type="first">I</forename><surname>Trescato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vettoretti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Di Camillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting><address><addrLine>Grenoble, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">September 9th to 12th, 2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Intelligent disease progression prediction: Overview of idpp@clef 2024</title>
		<author>
			<persName><forename type="first">G</forename><surname>Birolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bosoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Aidos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bergamaschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cavalla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chiò</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dagliati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Carvalho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Di Nunzio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fariselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>García Dominguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gromicho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Guazzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Longato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Madeira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Manera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Marchesin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Menotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Silvello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tavazzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tavazzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Trescato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vettoretti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Di Camillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction -15th International Conference of the CLEF Association, CLEF 2024</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Grenoble, France</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">September 9th to 12th, 2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kueffner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Zach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bronfeld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Norel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Atassi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Balagurusamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Di Camillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cudkowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dillenberger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific reports</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">690</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Model-based and model-free techniques for amyotrophic lateral sclerosis diagnostic prediction and patient clustering</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Goutman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kalinin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mukherjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">D</forename><surname>Dinov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neuroinformatics</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="407" to="421" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Deep learning methods to predict amyotrophic lateral sclerosis disease progression</title>
		<author>
			<persName><forename type="first">C</forename><surname>Pancotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Birolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rollo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Sanavia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Di Camillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Manera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chiò</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fariselli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific reports</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page">13738</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Upper limb movements as digital biomarkers in people with als</title>
		<author>
			<persName><forename type="first">M</forename><surname>Straczkiewicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Karas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Burke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Scheier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">B</forename><surname>Royse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Calcagno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Iyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Berry</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">EBioMedicine</title>
		<imprint>
			<biblScope unit="volume">101</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Objectively monitoring amyotrophic lateral sclerosis patient symptoms during clinical trials with sensors: observational study</title>
		<author>
			<persName><forename type="first">L</forename><surname>Garcia-Gancedo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lavrov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Parr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Marsden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Turner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Talbot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Chiwera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Shaw</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">JMIR mHealth and uHealth</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page">e13433</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Mobility disability and the pattern of accelerometer-derived sedentary and physical activity behaviors in people with multiple sclerosis</title>
		<author>
			<persName><forename type="first">V</forename><surname>Ezeugwu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Klaren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Hubbard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">T</forename><surname>Manns</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">W</forename><surname>Motl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Preventive medicine reports</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="241" to="246" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Accelerometry for remote monitoring of physical activity in amyotrophic lateral sclerosis: a longitudinal cohort study</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">P</forename><surname>Van Eijk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">N</forename><surname>Bakers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Bunte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>De Fockert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Eijkemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">H</forename><surname>Van Den</surname></persName>
		</author>
		<author>
			<persName><surname>Berg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of neurology</title>
		<imprint>
			<biblScope unit="volume">266</biblScope>
			<biblScope unit="page" from="2387" to="2395" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Using an onset-anchored bayesian hierarchical model to improve predictions for amyotrophic lateral sclerosis disease progression</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Karanevich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Statland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">J</forename><surname>Gajewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>He</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC medical research methodology</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
