Evaluating Multi-task Curriculum Learning for Forecasting Energy Consumption in Electric Heavy-duty Vehicles Yuantao Fan1,* , Sławomir Nowaczyk1 , Zhenkan Wang2 and Sepideh Pashami1,3 1 Kristian IV:s väg 3, 301 18 Halmstad, Sweden, Center for Applied Intelligent Systems Research (CAISR), Halmstad University 2 Gropegårdsgatan 2, 417 15 Göteborg, Sweden, Volvo Group 3 Isafjordsgatan 28 A, 164 40 Kista, Sweden, Research Institutes of Sweden (RISE) Abstract Accurate energy consumption prediction is crucial for optimising the operation of electric commercial heavy-duty vehicles, particularly for efficient route planning, refining charging strategies, and ensuring optimal truck configuration for specific tasks. This study investigates the application of multi-task curriculum learning to enhance machine learning models for forecasting the energy consumption of various onboard systems in electric vehicles. Multi-task learning, unlike traditional training approaches, leverages auxiliary tasks to provide additional training signals, which has been shown to enhance predictive performance in many domains. By further incorporating curriculum learning, where simpler tasks are learned before progressing to more complex ones, neural network training becomes more efficient and effective. We evaluate the suitability of these methodologies in the context of electric vehicle energy forecasting, examining whether the combination of multi-task learning and curriculum learning enhances algorithm generalisation, even with limited training data. We primarily focus on understanding the efficacy of different curriculum learning strategies, including sequential learning and progressive continual learning, using complex, real-world industrial data. Our research further explores a set of auxiliary tasks designed to facilitate the learning process by targeting key consumption characteristics projected into future time frames. The findings illustrate the potential of multi-task curriculum learning to advance energy consumption forecasting, significantly contributing to the optimisation of electric heavy-duty vehicle operations. This work offers a novel perspective on integrating advanced machine learning techniques to enhance energy efficiency in the exciting field of electromobility. Keywords Energy Consumption Forecasting, Curriculum Learning, Multi-task Learning, Electric Vehicles 1. Introduction Predicting energy consumption for electric vehicles (EVs), especially those used in commercial heavy-duty contexts, is paramount for improving their operational efficiency and promoting HAII5.0: Embracing Human-Aware AI in Industry 5.0, at ECAI 2024, 19 October 2024, Santiago de Compostela, Spain. * Corresponding author. $ yuantao.fan@hh.se (Y. Fan); slawomir.nowaczyk@hh.se (S. Nowaczyk); zhenkan.wang@volvo.com (Z. Wang); sepideh.pashami@hh.se (S. Pashami)  0000-0002-3034-6630 (Y. Fan); 0000-0002-7796-5201 (S. Nowaczyk); 0000-0003-3272-4145 (S. Pashami) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings sustainability. Effective energy consumption forecasts are indispensable for strategic route planning, optimising charging protocols, and ensuring that vehicle configurations align well with specific operational demands. As electric vehicles gain traction as a viable and eco- friendly alternative to internal combustion engine vehicles, the importance of precise energy consumption predictions becomes increasingly pronounced. The challenges in this domain are multifaceted, stemming from the inherent variability in driving conditions, vehicle load, and diverse environmental factors, which collectively complicate the development of accurate predictive models. Overcoming these obstacles is essential not only for enhancing the reliability and performance of EVs but also for minimising operational costs and boosting the overall efficiency of electric transport systems. The transition to electric vehicles is a significant step towards reducing greenhouse gas emissions and achieving sustainable transportation goals. However, since limited energy storage puts unique constraints on which operations are feasible, the benefits of EVs can only be fully realised through the development of specified forecasting methods that accurately anticipate energy needs. In this context, AI and ML emerge as transformative tools. AI-driven models can analyse vast amounts of data to uncover patterns and relationships that are not immediately apparent, providing more accurate and reliable energy consumption forecasts. These models can adapt to new data, continuously improving their predictions over time. Nevertheless, energy consumption forecasting for EVs faces critical challenges, such as dynamic driving conditions and fluctuating loads, which makes even state-of-the-art methods struggle to handle complex real-world data effectively. While the potential to learn from historical data and identify trends that influence energy consumption is the biggest strength of ML-based approaches, it is crucial to develop robust models that can generalise well across different scenarios and vehicle types. The complexity and variability inherent in forecasting energy consumption for electric vehicles make it a relevant testing ground for cutting-edge modelling techniques that promise to handle diverse and dynamic data inputs. In particular, Multi-Task Learning (MTL) presents a compelling solution by enabling simultaneous training across multiple related tasks, thereby leveraging shared information to improve the predictive performance of each task. In contrast, training machine learning models in a traditional setting only utilize the target task. MCL is particularly beneficial in scenarios with limited training data, as it enhances generalisation by incorporating auxiliary tasks that provide additional training signals. Moreover, the efficacy of MTL can be further amplified by integrating curriculum learning (CL), which structures the learning process in a progressive manner. Curriculum learning organises tasks from simple to complex, allowing the model to build a robust foundation before tackling more challenging problems. By combining these methodologies into multi-task curriculum learning (MCL), we can efficiently train neural networks that not only perform better on individual tasks but also generalise more effectively across different contexts. MCL optimises the learning trajectory, ensuring that simpler tasks enhance the model’s capability to learn more complex ones, ultimately leading to more accurate and reliable energy consumption forecasts for electric heavy-duty vehicles. This integrative approach has been shown to be a potent strategy to address the multifaceted challenges in several domains but has not been applied to EV auxiliary energy forecasting before. Thus, this paper aims to evaluate the suitability of MCL in this real-world, complex scenario. Generating a set of auxiliary tasks is a critical step in the implementation of MCL – and how to do it for forecasting energy consumption in EVs requires experimental evaluation. To create auxiliary tasks, one must first obtain an understanding of the primary task, identifying key factors and variables that influence energy consumption and the types of patterns that are indicative of future behaviour. These factors often include vehicle load, driving speed, route characteristics, weather conditions, and driver behaviour. Each of these variables can serve as the basis for an auxiliary task. For instance, an auxiliary task might involve predicting the impact of vehicle load on energy consumption under different traffic conditions or estimating the effect of varying driving speeds on battery usage. Historical data from real-world vehicle operations can be mined to extract relevant patterns and correlations, which can then be used to define these auxiliary tasks. In this paper, we have decided to focus on the patterns within the forecasted value itself instead of exploiting multivariate vehicle signals. In particular, we define several types of energy consumption characteristics as targets for the auxiliary tasks, such as questioning whether the consumption in the next time frame exceeds the global mean, whether the consumption will be higher in the next time frame compared to the current consumption, or predicting the consumption difference between the start and the end of the next time frame. These tasks are general enough to be suitable for any forecasting task, while at the same time being sufficiently closely related to the actual primary task to, hopefully, provide useful information to boost the training process. The core contribution of this paper is the evaluation of applying several multi-task curriculum learning techniques for forecasting the energy consumption of heavy-duty electric vehicles, including the proposition of utilising key consumption characteristics as targets for generating auxiliary tasks for MCL. Comparison of MCL variations, with combinations of curriculum learning strategy (sequential learning and progressive continual learning) and auxiliary tasks, illustrates the improvements in the performance on a real-world data collected from normal operations of commercial transportation electric vehicles. The experimental results show progressive continual learning, with a logistic growth weighting function governing the learning balance between the primary and the auxiliary task, achieves the best performance; the result also shows that the first auxiliary task is the most helpful task for subsystems 1 and 4; the third auxiliary task is the most helpful task for subsystems 2 and 3. Furthermore, it is observed that MCL with the proposed auxiliary tasks can improve the learning efficiency of the model, achieving faster convergence to a point beyond which the gain from further training is limited. 2. Related Work Curriculum learning enables the training of machine learning models in a meaning order, from easy samples to sets of difficult and complex samples [1]. A Common approach for CL introduces easy-to-hard ordering of samples for the training process, e.g., vanilla CL, self-paced CL, balanced CL, etc. When multiple tasks are available, the easy-to-hard ordering of the tasks to be learned can be applied as well. Multi-task learning can be applied, by sharing information across a set of related tasks in the training process, and the performance can be further improved [2] via, e.g. Gradnorm [3] balancing the losses between multiple tasks. While most multi-task learning approaches aim at learning multi-tasks simultaneously, progressive curriculum learning allows determining the best order to learn multiple tasks to maximise the final result. Study [4] presented by Pentina et al. finds the best order of tasks to be learned in a sequence based on a generalisation bound criterion to optimise the average expected classification performance over all the tasks. Work [5] by Siahpour et al. introduced a penalty coefficient, as a function of the epoch step, to govern the training process by suppressing the loss, and noise respectively, from the domain discrimination task in the early stage, to ensure the efficient training of neural networks. Shi et al. proposed progressive contrastive learning [6] based on multi-prototypes in the dataset, the training process is ordered to learn the centroid prototype first, followed by the hard prototype, and finally the dynamic prototype. In this work, we explore sequential learning and progressive continual learning with a set of auxiliary tasks generated based on key characteristics of target signal. 3. Problem Formulation For a given primary learning task 𝒯𝑖 , we create a set of auxiliary tasks 𝒯𝑖𝑗 , where 𝒯𝑖 corresponds to the primary task (in our case, the forecasting of energy consumption for the 𝑖-th auxiliary subsystem in an electric truck), and 𝒯𝑖𝑗 corresponds to the 𝑗-th type of auxiliary task. The majority of the multi-task learning studies aim to learn all relevant tasks together to improve the performance for each task 𝒯𝑖 . In our study, we are only interested in improving the energy forecasting tasks 𝒯𝑖 , not the generated auxiliary tasks 𝒯𝑖𝑗 . All energy forecasting tasks and the auxiliary tasks are learned from the same dataset, multi-variate time series sensor readings were collected from the normal operations of several heavy-duty electric vehicles. Let us denote data of the multivariate time series x of each vehicle 𝑣 by 𝑋 = { 𝑥𝑘𝑣,𝑡 | 𝑡 = 1, 2, ..., 𝑇𝑒 (𝑣), 𝑘 = 1, 2, ..., 𝐾}, where 𝑥𝑘𝑣,𝑡 is the value of the 𝑘-th feature x given a vehicle/tra- jectory 𝑣 at time 𝑡, and 𝑇𝑒 (𝑣) corresponds to the end of the recording. A subset of the features 𝑢𝑖𝑣,𝑡 reflects the energy consumption of subsystem 𝑖 at time 𝑡. The target energy consumption 𝑖 𝑦𝑣,𝑡 0 in a future time frame ∑︀ 𝜏𝑝ℎ can be approximated by summing up the energy consumed over this time frame 𝑦𝑣,𝑡 𝑖 0 = 𝑡∈[𝑡0 ,𝑡0 +𝜏𝑝ℎ ] 𝑝𝑖 (𝑡) · ∆𝑡, where 𝑝𝑖 (𝑡) is the power consumption at time 𝑡, and ∆𝑡 is the time interval between two samples. In this study, we set 𝜏𝑝ℎ equal to 10 minutes. For a given forecasting task 𝒯𝑖 , a regression model 𝑓𝑖 (·) is trained together with one of the auxiliary tasks 𝒯𝑖𝑗 to estimate consumption 𝑦𝑣,𝑡 𝑗 𝑖 . In this study, neural networks, a shared feature extractor, with multiple heads, each corresponding to one task, were trained under different settings and evaluated for their performance after 200 training epochs. We explore different multi-task curriculum learning settings and auxiliary tasks for forecasting energy consumption. The MCL methods were compared to the traditional approach 4. Method 4.1. Auxiliary Tasks For a given regression task 𝒯𝑖 (forecasting energy consumption for one of the subsystems), a set of auxiliary tasks was generated to assist the learning progress. We explore the use of five types of consumption characteristics as targets for creating the auxiliary tasks: i) 𝒯𝑖1 : classifying whether the consumption in the next time frame exceeds the global mean for that subsystem 𝑖; ii) 𝒯𝑖2 : classifying whether the consumption will increase in the next time frame, compared with the current consumption; iii) 𝒯𝑖3 : classifying whether the consumption at the end of the next time frame exceeds the starting point; iv) 𝒯𝑖4 : predicting the consumption difference between the start and the end of the next time frame; v) 𝒯𝑖5 predicting the difference between the peak consumption and the lowest consumption in the next time frame. The first three auxiliary tasks are classification task, the other two tasks are regression task. Learning to predict these key consumption characteristics in these auxiliary tasks 𝒯𝑖𝑗 , along with the primary tasks 𝒯𝑖 , under MCL, are evaluated for their usefulness. 4.2. Network Architecture The regression model evaluated for MCL in this study builds on a multi-layer perceptron. The model is comprised of a shared feature extractor and two heads, one head carries out the main task 𝒯𝑖 , and the other corresponds to one of the five auxiliary tasks 𝒯𝑖𝑗 . The network architecture is illustrated in Figure 1. For auxiliary tasks that are classification tasks, a sigmoid function was applied to the output of the corresponding head. 4.3. Curriculum Learning Strategy The two curriculum learning strategies evaluated in this work are sequential learning (SeqL) and progressive continual learning (PCL). The overall optimisation loss ℒ can be defined as: ℒ = 𝜆ℒ𝒯𝑖 + (1 − 𝜆)ℒ𝒯 𝑗 (1) 𝑖 where ℒ𝒯𝑖 denotes the loss for the primary tasks, while ℒ𝒯 𝑗 denotes the loss for the auxiliary 𝑖 task 𝑗. The SeqL employed imposes a fixed ordering of the tasks, e.g. learning the auxiliary task first, before a predetermined epoch, and the primary task afterwards: {︃ 0 if 𝜂 < 𝒩𝑒𝑝 𝜆𝑆𝑒𝑞𝐿 = (2) 1 if 𝜂 ≥ 𝒩𝑒𝑝 where 𝜂 is the current training epochs, and 𝒩𝑒𝑝 is the number of epochs predetermined to switch to another task. The PCL employs a weighting mechanism, a function of training epochs, to govern the learning process and gradually increases the weights on the loss corresponding to the primary task: 2 𝜆𝑃 𝐶𝐿 = −1 (3) 1 + 𝑒𝑥𝑝(−10𝛼𝜂/𝒩𝑡𝑜𝑡 ) where 𝛼 is a coefficient governing the change rate (see Figure 2 for an illustration), 𝜂 is the current training epochs, and 𝒩𝑡𝑜𝑡 is the total amount of training epochs. The two curriculum learning strategies were compared with MTL without any special curriculum learning and learning only on the primary task 𝒯𝑖 Figure 1: MLP network architecture The two evaluation criteria in this study are (i) the test loss (Mean Absolute Error, MAE) after training converged, i.e., 𝑁𝑡𝑜𝑡 epochs, and (2) whether the proposed learning strategy achieves a faster convergence time, i.e., the epoch at which the test loss has reached a saturation point (no further significant decrease in the loss afterwards). In each case, different variants of MCL are compared against a learning process without any multi-task curriculum learning. The saturation point is detected using a knee point detection algorithm [7], proposed by Satopaa et al. 1.0 = 0.1 = 0.3 = 0.6 0.8 =1 0.6 0.4 0.2 0.0 0 25 50 75 100 125 150 175 200 Epochs Figure 2: PCL weighting function 5. Experiment Result The energy consumption dataset was collected from several electric trucks operating in dif- ferent countries for a couple of months, including sensor readings of mileage, speed, ambient temperature, and energy consumed for auxiliary subsystems, etc., from sessions of driving. The four subsystems we forecast energy consumption for are the air compressor 𝒯1 , the air conditioner 𝒯2 , the cabin heater 𝒯3 , and the heater of the energy storage system 𝒯4 . For the experiment conducted in this paper, the neural networks were implemented via Pytorch library [8], using an ADAM optimiser with a learning rate of 0.001. The loss function for the regression tasks is mean absolute error (MSE), and binary cross-entropy (BCE) was employed as the loss function for the classification tasks. The total number of training epochs 𝒩𝑡𝑜𝑡 is set to 200. For the sequential learning strategy, 𝒩𝑒𝑝 is set to 100, and for the progressive continual learning, 𝛼 value of 0.1 (i.e. a linear function) and 0.3 are tested. The experiments were conducted using 4-fold cross-validation driving session-wise, i.e. data from the same driving session would never appear in the training and the testing population together. Table 1 and Table 2 show the training and testing losses after 200 epochs of training of the neural networks using multi-task learning with any curriculum learning (MTL), sequential learning (SeqL), progressive continual learning with an 𝛼 of 0.1 (PCL-lin), and an 𝛼 of 0.3 (PCL-exp). The baseline performance, single task learning (STL), is produced with learning only on the primary task 𝒯𝑖 for each subsystem, shown in the parenthesis. It is shown in both tables that the lowest averaged MSE is achieved using PCL-exp. As a sanity check, Table 1 demonstrated that the training losses, after 200 epochs of training, of most MCL methods did converge to a level comparable to STL. For the testing losses shown in Table 2, applying PCL-exp on task sets {𝒯1 , 𝒯11 } and {𝒯4 , 𝒯41 } achieved lowest averaged MSE for forecasting energy consumption of subsystem 1 and 4 (i.e., the first auxiliary task appears to be the most Table 1 Comparison of training loss after the 200 epochs for different MCL approaches using different auxiliary tasks. The reference performances (using STL) are placed in parentheses. MCL results outperforming the baseline are highlighted in bold, and the best performance for each subsystem is underlined. Task1 (0.6202 ± 0.0163) MTL SeqL PCL-lin PCL-exp AuxTask1 0.6125 ± 0.0141 0.6054 ± 0.0167 0.609 ± 0.0185 0.6013 ± 0.0156 AuxTask2 0.6374 ± 0.0147 0.6502 ± 0.0138 0.6802 ± 0.0096 0.6435 ± 0.0156 AuxTask3 0.6165 ± 0.0166 0.6131 ± 0.0118 0.6076 ± 0.0154 0.6033 ± 0.0155 AuxTask4 0.6239 ± 0.0132 0.626 ± 0.0119 0.6567 ± 0.0161 0.6182 ± 0.0121 AuxTask5 0.6256 ± 0.0113 0.6152 ± 0.0121 0.625 ± 0.0147 0.614 ± 0.0117 Task2 (0.2617 ± 0.0245) MTL SeqL PCL-lin PCL-exp AuxTask1 0.2681 ± 0.023 0.276 ± 0.0128 0.2959 ± 0.0157 0.2541 ± 0.0171 AuxTask2 0.2619 ± 0.0158 0.3016 ± 0.0308 0.2939 ± 0.0204 0.2475 ± 0.037 AuxTask3 0.2662 ± 0.0395 0.2862 ± 0.0324 0.2379 ± 0.0245 0.2158 ± 0.0186 AuxTask4 0.2534 ± 0.011 0.2866 ± 0.0255 0.2795 ± 0.0202 0.2366 ± 0.0432 AuxTask5 0.2638 ± 0.0168 0.2691 ± 0.0361 0.2971 ± 0.0167 0.2436 ± 0.0285 Task3 (0.3173 ± 0.0115) MTL SeqL PCL-lin PCL-exp AuxTask1 0.3223 ± 0.0132 0.3138 ± 0.0084 0.3248 ± 0.012 0.3116 ± 0.0111 AuxTask2 0.3217 ± 0.0109 0.3222 ± 0.0116 0.3423 ± 0.0113 0.317 ± 0.0096 AuxTask3 0.3148 ± 0.0074 0.3117 ± 0.0151 0.3229 ± 0.0121 0.3018 ± 0.0116 AuxTask4 0.333 ± 0.0126 0.3272 ± 0.0137 0.356 ± 0.0214 0.3188 ± 0.0152 AuxTask5 0.3213 ± 0.0103 0.3188 ± 0.0143 0.3412 ± 0.0091 0.3171 ± 0.0122 Task4 (0.2936 ± 0.0129) MTL SeqL PCL-lin PCL-exp AuxTask1 0.2903 ± 0.0186 0.3248 ± 0.0159 0.2684 ± 0.0208 0.2646 ± 0.0124 AuxTask2 0.2941 ± 0.0123 0.3511 ± 0.0171 0.3565 ± 0.0866 0.2583 ± 0.0156 AuxTask3 0.2979 ± 0.0136 0.3064 ± 0.0165 0.2624 ± 0.0081 0.344 ± 0.152 AuxTask4 0.3269 ± 0.0127 0.3712 ± 0.0117 0.425 ± 0.035 0.329 ± 0.0275 AuxTask5 0.3145 ± 0.0142 0.3142 ± 0.0311 0.3334 ± 0.0076 0.3036 ± 0.0182 helpful auxiliary task for subsystem 1 and 4); similarly, applying PCL-exp on task sets {𝒯2 , 𝒯23 } and {𝒯3 , 𝒯33 } achieved lowest averaged MSE for forecasting energy consumption of subsystem 2 and 3. Figure 3 illustrates the differences between several multi-task curriculum learning strategies, focusing on the convergence speed. Specifically, we identify a reference point (epoch) beyond which the gain from further training is limited. This reference point is computed using a knee point detection method (algorithm [7] by Satopaa et al.) on the mean STL test losses (shown as grey dots and the corresponding dash line). The four plots in Figure. 3 illustrate the testing loss for learning the four primary tasks, along with their 5-th auxiliary task, i.e. 𝒯𝑖5 . It is observed in Figure 3: i) there is no significant difference between the four approaches for 𝒯1 ; ii) MTL and PCL-lin drop slightly slower compared to STL and PCL-exp for 𝒯2 ; iii) both PCL approach drop slower compared with STL and MTL for 𝒯3 ; iv) MTL, PCL-lin, and PCL-exp drops faster Table 2 Comparison of test loss after the 200 epochs for different MCL approaches using different auxiliary tasks. The reference performances (using STL) are placed in parentheses. MCL results outperforming the baseline are highlighted in bold, and the best performance for each subsystem is underlined. Task1 (0.6861 ± 0.0713) MTL SeqL PCL-lin PCL-exp AuxTask1 0.6829 ± 0.0707 0.692 ± 0.072 0.6827 ± 0.0727 0.6784 ± 0.0672 AuxTask2 0.6948 ± 0.0736 0.7037 ± 0.0749 0.7248 ± 0.0702 0.7076 ± 0.0709 AuxTask3 0.6812 ± 0.0744 0.6969 ± 0.0634 0.698 ± 0.0762 0.6943 ± 0.0774 AuxTask4 0.6917 ± 0.0775 0.6934 ± 0.069 0.7058 ± 0.0634 0.6881 ± 0.0684 AuxTask5 0.6968 ± 0.0712 0.6894 ± 0.0735 0.6913 ± 0.074 0.6862 ± 0.073 Task2 (0.4374 ± 0.1821) MTL SeqL PCL-lin PCL-exp AuxTask1 0.4553 ± 0.2035 0.4199 ± 0.1608 0.4277 ± 0.1788 0.4285 ± 0.177 AuxTask2 0.4322 ± 0.1671 0.4448 ± 0.1766 0.4319 ± 0.1676 0.4329 ± 0.1678 AuxTask3 0.4109 ± 0.1556 0.4427 ± 0.1782 0.4105 ± 0.1398 0.3929 ± 0.1602 AuxTask4 0.4105 ± 0.1632 0.4436 ± 0.1673 0.4699 ± 0.1928 0.4362 ± 0.1752 AuxTask5 0.4702 ± 0.2093 0.4734 ± 0.2158 0.4874 ± 0.2344 0.4632 ± 0.2307 Task3 (0.3827 ± 0.0551) MTL SeqL PCL-lin PCL-exp AuxTask1 0.3777 ± 0.0593 0.377 ± 0.0627 0.3829 ± 0.0609 0.3774 ± 0.0577 AuxTask2 0.387 ± 0.0503 0.3901 ± 0.0618 0.401 ± 0.0534 0.3892 ± 0.0659 AuxTask3 0.3859 ± 0.0534 0.3857 ± 0.0578 0.3868 ± 0.0588 0.3766 ± 0.0684 AuxTask4 0.3847 ± 0.0683 0.3828 ± 0.0563 0.4021 ± 0.0724 0.3865 ± 0.0602 AuxTask5 0.3874 ± 0.0587 0.3868 ± 0.0605 0.4016 ± 0.0563 0.38 ± 0.0549 Task4 (0.4166 ± 0.0679) MTL SeqL PCL-lin PCL-exp AuxTask1 0.4155 ± 0.065 0.4783 ± 0.0698 0.4332 ± 0.0801 0.3986 ± 0.0567 AuxTask2 0.4085 ± 0.0745 0.4786 ± 0.0853 0.5225 ± 0.1217 0.4339 ± 0.0813 AuxTask3 0.424 ± 0.0492 0.5029 ± 0.0554 0.4253 ± 0.054 0.4874 ± 0.1068 AuxTask4 0.4251 ± 0.0774 0.4548 ± 0.0767 0.4936 ± 0.0718 0.4375 ± 0.0664 AuxTask5 0.4291 ± 0.0613 0.4442 ± 0.0662 0.4535 ± 0.0651 0.4347 ± 0.0716 compared to STL. Table 3 shows a comparison between MCL methods on the convergence time to the reference point (computed based on STL mean testing losses over the four folds). It is observed that: i) MTL outperforms STL in all four primary tasks, and converged to the reference point faster than other approaches in three out of four primary tasks; ii) the performance of PCL-lin achieved converged fast for two of the tasks; iii) PCL-exp achieved better performance compared to PCL-lin, with overall short convergence time. The result corresponding to SeqL is particularly interesting. Although a 𝒩𝑒𝑝 of 100 epochs is adopted for SeqL (i.e. trained on one of the auxiliary tasks for the first 100 epochs before learning the primary task), the testing loss converges to the reference point within 10 epochs in the majority of the cases. From an empirical perspective, the proposed auxiliary tasks assisted the learning (of the models) for the primary task, resulting Task 1 Task 2 Task 3 Task 4 0.82 0.75 0.60 0.75 STL STL STL STL MTL MTL MTL MTL 0.80 PCL-Lin 0.70 PCL-Lin PCL-Lin 0.70 PCL-Lin PCL-exp PCL-exp 0.55 PCL-exp PCL-exp 0.65 0.65 0.78 0.60 0.50 0.60 0.76 0.55 0.55 0.74 0.50 0.45 0.50 0.72 0.45 0.45 0.40 0.70 0.40 0.40 0.68 0.35 0.35 0.35 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Epoch Epoch Epoch Epoch Figure 3: Comparison of convergence speed for different MCL approaches and auxiliary tasks. in a faster convergence time to the reference point. Table 3 Comparison of convergence speed to reach a point beyond which the gain from further training is limited. The reference point is given by STL loss sequences averaged over 4 folds. MCL results outperforming the baseline are highlighted in bold, and the best performance for each subsystem is underlined. Task1 (20 Ep.) MTL SeqL PCL-lin PCL-exp AuxTask1 13.0 ± 1.4142 102.25 ± 0.433 16.75 ± 2.2776 16.25 ± 8.9268 AuxTask2 32.5 ± 4.0311 125.75 ± 2.2776 103.5 ± 28.8141 45.75 ± 26.6962 AuxTask3 15.75 ± 3.8971 106.5 ± 1.118 29.0 ± 3.3912 22.25 ± 8.1968 AuxTask4 10.5 ± 1.5 106.5 ± 2.2913 70.5 ± 32.7605 29.0 ± 15.1493 AuxTask5 15.25 ± 6.7961 102.5 ± 0.5 18.0 ± 4.3012 19.0 ± 9.083 Task2 (19 Ep.) MTL SeqL PCL-lin PCL-exp AuxTask1 15.625 ± 7.9047 58.875 ± 50.5827 34.0 ± 22.7211 20.125 ± 14.4606 AuxTask2 18.0 ± 9.2736 108.25 ± 9.8075 44.5 ± 14.239 23.5 ± 11.4127 AuxTask3 26.0 ± 16.4773 106.75 ± 7.9804 42.5 ± 12.0312 22.75 ± 7.1545 AuxTask4 32.0 ± 33.2039 111.25 ± 13.3112 60.3333 ± 21.6384 32.0 ± 18.8149 AuxTask5 17.0 ± 11.2916 108.5 ± 8.6168 100.0 ± 86.0145 28.0 ± 24.8697 Task3 (14 Ep.) MTL SeqL PCL-lin PCL-exp AuxTask1 9.25 ± 5.4025 104.5 ± 3.5707 22.5 ± 20.6458 10.25 ± 4.8153 AuxTask2 8.5 ± 7.7298 58.75 ± 50.5118 36.125 ± 38.6505 21.875 ± 19.3354 AuxTask3 9.75 ± 7.9175 105.75 ± 3.6997 37.25 ± 20.216 18.0 ± 8.2765 AuxTask4 11.25 ± 7.5291 105.75 ± 3.2692 55.75 ± 36.8536 24.5 ± 11.9478 AuxTask5 10.0 ± 8.6891 103.75 ± 3.0311 36.25 ± 25.4497 22.25 ± 12.8331 Task4 (19 Ep.) MTL SeqL PCL-lin PCL-exp AuxTask1 14.75 ± 2.1651 107.5 ± 3.3541 25.5 ± 2.958 21.5 ± 1.118 AuxTask2 15.75 ± 4.867 106.0 ± 1.5811 28.5 ± 3.8406 20.75 ± 2.3848 AuxTask3 14.75 ± 2.9896 61.375 ± 47.7178 18.875 ± 8.1 17.1429 ± 4.4538 AuxTask4 16.5 ± 5.6789 107.5 ± 1.5 41.5 ± 22.0057 24.0 ± 13.6015 AuxTask5 15.0 ± 4.3589 105.5 ± 3.2016 14.75 ± 3.562 14.0 ± 4.3589 6. Conclusion and Future Work In this work-in-progress paper, several multi-task curriculum learning strategies were evaluated for forecasting the energy consumption of auxiliary subsystems in heavy-duty electric vehi- cles. The preliminary results show that progressive continual learning has achieved the best performance (lowest averaged MSE) compared to multi-task learning with any CL, Sequential CL, and the traditional approach (STL). Moreover, the proposed auxiliary tasks, based on key consumption characteristics, for CL have shown to be useful for solving all four primary tasks, both in terms of regression error as well as convergence speed. Future works include: (i) developing methods to rank/order the auxiliary tasks based on relevancy and select the top 𝑘 number of tasks for CL; (ii) proposing adaptive methods for governing the learning process, e.g. based on weighting the losses of auxiliary tasks using learning dynamics; (iii) enabling CL across primary tasks, based on task relevancy. Acknowledgments The work was carried out with support from the Knowledge Foundation and Vinnova (Sweden’s innovation agency) through the Vehicle Strategic Research and Innovation Programme FFI. References [1] P. Soviany, R. T. Ionescu, P. Rota, N. Sebe, Curriculum learning: A survey, International Journal of Computer Vision 130 (2022) 1526–1565. [2] Y. Zhang, Q. Yang, A survey on multi-task learning, IEEE Transactions on Knowledge and Data Engineering 34 (2021) 5586–5609. [3] Z. Chen, V. Badrinarayanan, C.-Y. Lee, A. Rabinovich, Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks, in: International conference on machine learning, PMLR, 2018, pp. 794–803. [4] A. Pentina, V. Sharmanska, C. H. Lampert, Curriculum learning of multiple tasks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5492–5500. [5] S. Siahpour, X. Li, J. Lee, A novel transfer learning approach in remaining useful life prediction for incomplete dataset, IEEE Transactions on Instrumentation and Measurement 71 (2022) 1–11. [6] J. Shi, X. Yin, Y. Wang, X. Liu, Y. Xie, Y. Qu, Progressive contrastive learning with multi-prototype for unsupervised visible-infrared person re-identification, arXiv preprint arXiv:2402.19026 (2024). [7] V. Satopaa, J. Albrecht, D. Irwin, B. Raghavan, Finding a" kneedle" in a haystack: Detecting knee points in system behavior, in: 2011 31st international conference on distributed computing systems workshops, IEEE, 2011, pp. 166–171. [8] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch (2017).