=Paper= {{Paper |id=Vol-3765/paper07 |storemode=property |title=Evaluating Multi-task Curriculum Learning for Forecasting Energy Consumption in Electric Heavy-duty Vehicles |pdfUrl=https://ceur-ws.org/Vol-3765/Camera_Ready_Paper-07.pdf |volume=Vol-3765 |authors=Yuantao Fan,Zhenkan Wang,Sepideh Pashami,Sławomir Nowaczyk |dblpUrl=https://dblp.org/rec/conf/haii/FanWPN24 }} ==Evaluating Multi-task Curriculum Learning for Forecasting Energy Consumption in Electric Heavy-duty Vehicles== https://ceur-ws.org/Vol-3765/Camera_Ready_Paper-07.pdf

Evaluating Multi-task Curriculum Learning for
Forecasting Energy Consumption in Electric
Heavy-duty Vehicles
Yuantao Fan1,* , Sławomir Nowaczyk1 , Zhenkan Wang2 and Sepideh Pashami1,3
1
Kristian IV:s väg 3, 301 18 Halmstad, Sweden, Center for Applied Intelligent Systems Research (CAISR), Halmstad
University
2
Gropegårdsgatan 2, 417 15 Göteborg, Sweden, Volvo Group
3
Isafjordsgatan 28 A, 164 40 Kista, Sweden, Research Institutes of Sweden (RISE)

Abstract
Accurate energy consumption prediction is crucial for optimising the operation of electric commercial
heavy-duty vehicles, particularly for efficient route planning, refining charging strategies, and ensuring
optimal truck configuration for specific tasks. This study investigates the application of multi-task
curriculum learning to enhance machine learning models for forecasting the energy consumption of
various onboard systems in electric vehicles. Multi-task learning, unlike traditional training approaches,
leverages auxiliary tasks to provide additional training signals, which has been shown to enhance
predictive performance in many domains. By further incorporating curriculum learning, where simpler
tasks are learned before progressing to more complex ones, neural network training becomes more
efficient and effective.
We evaluate the suitability of these methodologies in the context of electric vehicle energy forecasting,
examining whether the combination of multi-task learning and curriculum learning enhances algorithm
generalisation, even with limited training data. We primarily focus on understanding the efficacy of
different curriculum learning strategies, including sequential learning and progressive continual learning,
using complex, real-world industrial data.
Our research further explores a set of auxiliary tasks designed to facilitate the learning process by
targeting key consumption characteristics projected into future time frames. The findings illustrate the
potential of multi-task curriculum learning to advance energy consumption forecasting, significantly
contributing to the optimisation of electric heavy-duty vehicle operations. This work offers a novel
perspective on integrating advanced machine learning techniques to enhance energy efficiency in the
exciting field of electromobility.

Keywords
Energy Consumption Forecasting, Curriculum Learning, Multi-task Learning, Electric Vehicles

1. Introduction
Predicting energy consumption for electric vehicles (EVs), especially those used in commercial
heavy-duty contexts, is paramount for improving their operational efficiency and promoting

HAII5.0: Embracing Human-Aware AI in Industry 5.0, at ECAI 2024, 19 October 2024, Santiago de Compostela, Spain.
*
Corresponding author.
$ yuantao.fan@hh.se (Y. Fan); slawomir.nowaczyk@hh.se (S. Nowaczyk); zhenkan.wang@volvo.com (Z. Wang);
sepideh.pashami@hh.se (S. Pashami)
0000-0002-3034-6630 (Y. Fan); 0000-0002-7796-5201 (S. Nowaczyk); 0000-0003-3272-4145 (S. Pashami)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
sustainability. Effective energy consumption forecasts are indispensable for strategic route
planning, optimising charging protocols, and ensuring that vehicle configurations align well
with specific operational demands. As electric vehicles gain traction as a viable and eco-
friendly alternative to internal combustion engine vehicles, the importance of precise energy
consumption predictions becomes increasingly pronounced. The challenges in this domain
are multifaceted, stemming from the inherent variability in driving conditions, vehicle load,
and diverse environmental factors, which collectively complicate the development of accurate
predictive models. Overcoming these obstacles is essential not only for enhancing the reliability
and performance of EVs but also for minimising operational costs and boosting the overall
efficiency of electric transport systems.
The transition to electric vehicles is a significant step towards reducing greenhouse gas
emissions and achieving sustainable transportation goals. However, since limited energy storage
puts unique constraints on which operations are feasible, the benefits of EVs can only be fully
realised through the development of specified forecasting methods that accurately anticipate
energy needs. In this context, AI and ML emerge as transformative tools. AI-driven models can
analyse vast amounts of data to uncover patterns and relationships that are not immediately
apparent, providing more accurate and reliable energy consumption forecasts. These models
can adapt to new data, continuously improving their predictions over time.
Nevertheless, energy consumption forecasting for EVs faces critical challenges, such as
dynamic driving conditions and fluctuating loads, which makes even state-of-the-art methods
struggle to handle complex real-world data effectively. While the potential to learn from
historical data and identify trends that influence energy consumption is the biggest strength
of ML-based approaches, it is crucial to develop robust models that can generalise well across
different scenarios and vehicle types.
The complexity and variability inherent in forecasting energy consumption for electric
vehicles make it a relevant testing ground for cutting-edge modelling techniques that promise
to handle diverse and dynamic data inputs. In particular, Multi-Task Learning (MTL) presents a
compelling solution by enabling simultaneous training across multiple related tasks, thereby
leveraging shared information to improve the predictive performance of each task. In contrast,
training machine learning models in a traditional setting only utilize the target task. MCL is
particularly beneficial in scenarios with limited training data, as it enhances generalisation by
incorporating auxiliary tasks that provide additional training signals. Moreover, the efficacy of
MTL can be further amplified by integrating curriculum learning (CL), which structures the
learning process in a progressive manner. Curriculum learning organises tasks from simple to
complex, allowing the model to build a robust foundation before tackling more challenging
problems. By combining these methodologies into multi-task curriculum learning (MCL),
we can efficiently train neural networks that not only perform better on individual tasks
but also generalise more effectively across different contexts. MCL optimises the learning
trajectory, ensuring that simpler tasks enhance the model’s capability to learn more complex
ones, ultimately leading to more accurate and reliable energy consumption forecasts for electric
heavy-duty vehicles. This integrative approach has been shown to be a potent strategy to
address the multifaceted challenges in several domains but has not been applied to EV auxiliary
energy forecasting before. Thus, this paper aims to evaluate the suitability of MCL in this
real-world, complex scenario.
Generating a set of auxiliary tasks is a critical step in the implementation of MCL – and
how to do it for forecasting energy consumption in EVs requires experimental evaluation. To
create auxiliary tasks, one must first obtain an understanding of the primary task, identifying
key factors and variables that influence energy consumption and the types of patterns that are
indicative of future behaviour. These factors often include vehicle load, driving speed, route
characteristics, weather conditions, and driver behaviour. Each of these variables can serve
as the basis for an auxiliary task. For instance, an auxiliary task might involve predicting the
impact of vehicle load on energy consumption under different traffic conditions or estimating
the effect of varying driving speeds on battery usage. Historical data from real-world vehicle
operations can be mined to extract relevant patterns and correlations, which can then be used to
define these auxiliary tasks. In this paper, we have decided to focus on the patterns within the
forecasted value itself instead of exploiting multivariate vehicle signals. In particular, we define
several types of energy consumption characteristics as targets for the auxiliary tasks, such as
questioning whether the consumption in the next time frame exceeds the global mean, whether
the consumption will be higher in the next time frame compared to the current consumption,
or predicting the consumption difference between the start and the end of the next time frame.
These tasks are general enough to be suitable for any forecasting task, while at the same
time being sufficiently closely related to the actual primary task to, hopefully, provide useful
information to boost the training process.
The core contribution of this paper is the evaluation of applying several multi-task curriculum
learning techniques for forecasting the energy consumption of heavy-duty electric vehicles,
including the proposition of utilising key consumption characteristics as targets for generating
auxiliary tasks for MCL. Comparison of MCL variations, with combinations of curriculum
learning strategy (sequential learning and progressive continual learning) and auxiliary tasks,
illustrates the improvements in the performance on a real-world data collected from normal
operations of commercial transportation electric vehicles. The experimental results show
progressive continual learning, with a logistic growth weighting function governing the learning
balance between the primary and the auxiliary task, achieves the best performance; the result
also shows that the first auxiliary task is the most helpful task for subsystems 1 and 4; the
third auxiliary task is the most helpful task for subsystems 2 and 3. Furthermore, it is observed
that MCL with the proposed auxiliary tasks can improve the learning efficiency of the model,
achieving faster convergence to a point beyond which the gain from further training is limited.

2. Related Work
Curriculum learning enables the training of machine learning models in a meaning order,
from easy samples to sets of difficult and complex samples [1]. A Common approach for CL
introduces easy-to-hard ordering of samples for the training process, e.g., vanilla CL, self-paced
CL, balanced CL, etc. When multiple tasks are available, the easy-to-hard ordering of the tasks
to be learned can be applied as well. Multi-task learning can be applied, by sharing information
across a set of related tasks in the training process, and the performance can be further improved
[2] via, e.g. Gradnorm [3] balancing the losses between multiple tasks. While most multi-task
learning approaches aim at learning multi-tasks simultaneously, progressive curriculum learning
allows determining the best order to learn multiple tasks to maximise the final result. Study [4]
presented by Pentina et al. finds the best order of tasks to be learned in a sequence based on
a generalisation bound criterion to optimise the average expected classification performance
over all the tasks. Work [5] by Siahpour et al. introduced a penalty coefficient, as a function of
the epoch step, to govern the training process by suppressing the loss, and noise respectively,
from the domain discrimination task in the early stage, to ensure the efficient training of neural
networks. Shi et al. proposed progressive contrastive learning [6] based on multi-prototypes
in the dataset, the training process is ordered to learn the centroid prototype first, followed
by the hard prototype, and finally the dynamic prototype. In this work, we explore sequential
learning and progressive continual learning with a set of auxiliary tasks generated based on
key characteristics of target signal.

3. Problem Formulation
For a given primary learning task 𝒯𝑖 , we create a set of auxiliary tasks 𝒯𝑖𝑗 , where 𝒯𝑖 corresponds
to the primary task (in our case, the forecasting of energy consumption for the 𝑖-th auxiliary
subsystem in an electric truck), and 𝒯𝑖𝑗 corresponds to the 𝑗-th type of auxiliary task.
The majority of the multi-task learning studies aim to learn all relevant tasks together to
improve the performance for each task 𝒯𝑖 . In our study, we are only interested in improving
the energy forecasting tasks 𝒯𝑖 , not the generated auxiliary tasks 𝒯𝑖𝑗 . All energy forecasting
tasks and the auxiliary tasks are learned from the same dataset, multi-variate time series sensor
readings were collected from the normal operations of several heavy-duty electric vehicles.
Let us denote data of the multivariate time series x of each vehicle 𝑣 by 𝑋 = { 𝑥𝑘𝑣,𝑡 | 𝑡 =
1, 2, ..., 𝑇𝑒 (𝑣), 𝑘 = 1, 2, ..., 𝐾}, where 𝑥𝑘𝑣,𝑡 is the value of the 𝑘-th feature x given a vehicle/tra-
jectory 𝑣 at time 𝑡, and 𝑇𝑒 (𝑣) corresponds to the end of the recording. A subset of the features
𝑢𝑖𝑣,𝑡 reflects the energy consumption of subsystem 𝑖 at time 𝑡. The target energy consumption
𝑖
𝑦𝑣,𝑡 0
in a future time frame
∑︀ 𝜏𝑝ℎ can be approximated by summing up the energy consumed over
this time frame 𝑦𝑣,𝑡 𝑖
0
= 𝑡∈[𝑡0 ,𝑡0 +𝜏𝑝ℎ ] 𝑝𝑖 (𝑡) · ∆𝑡, where 𝑝𝑖 (𝑡) is the power consumption at time

𝑡, and ∆𝑡 is the time interval between two samples.
In this study, we set 𝜏𝑝ℎ equal to 10 minutes. For a given forecasting task 𝒯𝑖 , a regression model
𝑓𝑖 (·) is trained together with one of the auxiliary tasks 𝒯𝑖𝑗 to estimate consumption 𝑦𝑣,𝑡
𝑗 𝑖 . In this

study, neural networks, a shared feature extractor, with multiple heads, each corresponding to
one task, were trained under different settings and evaluated for their performance after 200
training epochs. We explore different multi-task curriculum learning settings and auxiliary
tasks for forecasting energy consumption. The MCL methods were compared to the traditional
approach

4. Method
4.1. Auxiliary Tasks
For a given regression task 𝒯𝑖 (forecasting energy consumption for one of the subsystems), a set
of auxiliary tasks was generated to assist the learning progress. We explore the use of five types
of consumption characteristics as targets for creating the auxiliary tasks: i) 𝒯𝑖1 : classifying
whether the consumption in the next time frame exceeds the global mean for that subsystem 𝑖;
ii) 𝒯𝑖2 : classifying whether the consumption will increase in the next time frame, compared with
the current consumption; iii) 𝒯𝑖3 : classifying whether the consumption at the end of the next
time frame exceeds the starting point; iv) 𝒯𝑖4 : predicting the consumption difference between
the start and the end of the next time frame; v) 𝒯𝑖5 predicting the difference between the peak
consumption and the lowest consumption in the next time frame. The first three auxiliary tasks
are classification task, the other two tasks are regression task. Learning to predict these key
consumption characteristics in these auxiliary tasks 𝒯𝑖𝑗 , along with the primary tasks 𝒯𝑖 , under
MCL, are evaluated for their usefulness.

4.2. Network Architecture
The regression model evaluated for MCL in this study builds on a multi-layer perceptron. The
model is comprised of a shared feature extractor and two heads, one head carries out the main
task 𝒯𝑖 , and the other corresponds to one of the five auxiliary tasks 𝒯𝑖𝑗 . The network architecture
is illustrated in Figure 1. For auxiliary tasks that are classification tasks, a sigmoid function was
applied to the output of the corresponding head.

4.3. Curriculum Learning Strategy
The two curriculum learning strategies evaluated in this work are sequential learning (SeqL)
and progressive continual learning (PCL). The overall optimisation loss ℒ can be defined as:

ℒ = 𝜆ℒ𝒯𝑖 + (1 − 𝜆)ℒ𝒯 𝑗 (1)
𝑖

where ℒ𝒯𝑖 denotes the loss for the primary tasks, while ℒ𝒯 𝑗 denotes the loss for the auxiliary
𝑖
task 𝑗. The SeqL employed imposes a fixed ordering of the tasks, e.g. learning the auxiliary task
first, before a predetermined epoch, and the primary task afterwards:
{︃
0 if 𝜂 < 𝒩𝑒𝑝
𝜆𝑆𝑒𝑞𝐿 = (2)
1 if 𝜂 ≥ 𝒩𝑒𝑝

where 𝜂 is the current training epochs, and 𝒩𝑒𝑝 is the number of epochs predetermined to
switch to another task.
The PCL employs a weighting mechanism, a function of training epochs, to govern the
learning process and gradually increases the weights on the loss corresponding to the primary
task:
2
𝜆𝑃 𝐶𝐿 = −1 (3)
1 + 𝑒𝑥𝑝(−10𝛼𝜂/𝒩𝑡𝑜𝑡 )
where 𝛼 is a coefficient governing the change rate (see Figure 2 for an illustration), 𝜂 is the
current training epochs, and 𝒩𝑡𝑜𝑡 is the total amount of training epochs. The two curriculum
learning strategies were compared with MTL without any special curriculum learning and
learning only on the primary task 𝒯𝑖
Figure 1: MLP network architecture

The two evaluation criteria in this study are (i) the test loss (Mean Absolute Error, MAE) after
training converged, i.e., 𝑁𝑡𝑜𝑡 epochs, and (2) whether the proposed learning strategy achieves a
faster convergence time, i.e., the epoch at which the test loss has reached a saturation point (no
further significant decrease in the loss afterwards). In each case, different variants of MCL are
compared against a learning process without any multi-task curriculum learning. The saturation
point is detected using a knee point detection algorithm [7], proposed by Satopaa et al.
1.0 = 0.1
= 0.3
= 0.6
0.8 =1

0.6

0.4

0.2

0.0
0 25 50 75 100 125 150 175 200
Epochs

Figure 2: PCL weighting function

5. Experiment Result
The energy consumption dataset was collected from several electric trucks operating in dif-
ferent countries for a couple of months, including sensor readings of mileage, speed, ambient
temperature, and energy consumed for auxiliary subsystems, etc., from sessions of driving.
The four subsystems we forecast energy consumption for are the air compressor 𝒯1 , the air
conditioner 𝒯2 , the cabin heater 𝒯3 , and the heater of the energy storage system 𝒯4 .
For the experiment conducted in this paper, the neural networks were implemented via
Pytorch library [8], using an ADAM optimiser with a learning rate of 0.001. The loss function
for the regression tasks is mean absolute error (MSE), and binary cross-entropy (BCE) was
employed as the loss function for the classification tasks. The total number of training epochs
𝒩𝑡𝑜𝑡 is set to 200. For the sequential learning strategy, 𝒩𝑒𝑝 is set to 100, and for the progressive
continual learning, 𝛼 value of 0.1 (i.e. a linear function) and 0.3 are tested. The experiments
were conducted using 4-fold cross-validation driving session-wise, i.e. data from the same
driving session would never appear in the training and the testing population together.
Table 1 and Table 2 show the training and testing losses after 200 epochs of training of the
neural networks using multi-task learning with any curriculum learning (MTL), sequential
learning (SeqL), progressive continual learning with an 𝛼 of 0.1 (PCL-lin), and an 𝛼 of 0.3
(PCL-exp). The baseline performance, single task learning (STL), is produced with learning
only on the primary task 𝒯𝑖 for each subsystem, shown in the parenthesis. It is shown in both
tables that the lowest averaged MSE is achieved using PCL-exp. As a sanity check, Table 1
demonstrated that the training losses, after 200 epochs of training, of most MCL methods
did converge to a level comparable to STL. For the testing losses shown in Table 2, applying
PCL-exp on task sets {𝒯1 , 𝒯11 } and {𝒯4 , 𝒯41 } achieved lowest averaged MSE for forecasting
energy consumption of subsystem 1 and 4 (i.e., the first auxiliary task appears to be the most
Table 1
Comparison of training loss after the 200 epochs for different MCL approaches using different auxiliary
tasks. The reference performances (using STL) are placed in parentheses. MCL results outperforming
the baseline are highlighted in bold, and the best performance for each subsystem is underlined.

Task1 (0.6202 ± 0.0163) MTL SeqL PCL-lin PCL-exp
AuxTask1 0.6125 ± 0.0141 0.6054 ± 0.0167 0.609 ± 0.0185 0.6013 ± 0.0156
AuxTask2 0.6374 ± 0.0147 0.6502 ± 0.0138 0.6802 ± 0.0096 0.6435 ± 0.0156
AuxTask3 0.6165 ± 0.0166 0.6131 ± 0.0118 0.6076 ± 0.0154 0.6033 ± 0.0155
AuxTask4 0.6239 ± 0.0132 0.626 ± 0.0119 0.6567 ± 0.0161 0.6182 ± 0.0121
AuxTask5 0.6256 ± 0.0113 0.6152 ± 0.0121 0.625 ± 0.0147 0.614 ± 0.0117
Task2 (0.2617 ± 0.0245) MTL SeqL PCL-lin PCL-exp
AuxTask1 0.2681 ± 0.023 0.276 ± 0.0128 0.2959 ± 0.0157 0.2541 ± 0.0171
AuxTask2 0.2619 ± 0.0158 0.3016 ± 0.0308 0.2939 ± 0.0204 0.2475 ± 0.037
AuxTask3 0.2662 ± 0.0395 0.2862 ± 0.0324 0.2379 ± 0.0245 0.2158 ± 0.0186
AuxTask4 0.2534 ± 0.011 0.2866 ± 0.0255 0.2795 ± 0.0202 0.2366 ± 0.0432
AuxTask5 0.2638 ± 0.0168 0.2691 ± 0.0361 0.2971 ± 0.0167 0.2436 ± 0.0285
Task3 (0.3173 ± 0.0115) MTL SeqL PCL-lin PCL-exp
AuxTask1 0.3223 ± 0.0132 0.3138 ± 0.0084 0.3248 ± 0.012 0.3116 ± 0.0111
AuxTask2 0.3217 ± 0.0109 0.3222 ± 0.0116 0.3423 ± 0.0113 0.317 ± 0.0096
AuxTask3 0.3148 ± 0.0074 0.3117 ± 0.0151 0.3229 ± 0.0121 0.3018 ± 0.0116
AuxTask4 0.333 ± 0.0126 0.3272 ± 0.0137 0.356 ± 0.0214 0.3188 ± 0.0152
AuxTask5 0.3213 ± 0.0103 0.3188 ± 0.0143 0.3412 ± 0.0091 0.3171 ± 0.0122
Task4 (0.2936 ± 0.0129) MTL SeqL PCL-lin PCL-exp
AuxTask1 0.2903 ± 0.0186 0.3248 ± 0.0159 0.2684 ± 0.0208 0.2646 ± 0.0124
AuxTask2 0.2941 ± 0.0123 0.3511 ± 0.0171 0.3565 ± 0.0866 0.2583 ± 0.0156
AuxTask3 0.2979 ± 0.0136 0.3064 ± 0.0165 0.2624 ± 0.0081 0.344 ± 0.152
AuxTask4 0.3269 ± 0.0127 0.3712 ± 0.0117 0.425 ± 0.035 0.329 ± 0.0275
AuxTask5 0.3145 ± 0.0142 0.3142 ± 0.0311 0.3334 ± 0.0076 0.3036 ± 0.0182

helpful auxiliary task for subsystem 1 and 4); similarly, applying PCL-exp on task sets {𝒯2 , 𝒯23 }
and {𝒯3 , 𝒯33 } achieved lowest averaged MSE for forecasting energy consumption of subsystem
2 and 3.
Figure 3 illustrates the differences between several multi-task curriculum learning strategies,
focusing on the convergence speed. Specifically, we identify a reference point (epoch) beyond
which the gain from further training is limited. This reference point is computed using a knee
point detection method (algorithm [7] by Satopaa et al.) on the mean STL test losses (shown as
grey dots and the corresponding dash line). The four plots in Figure. 3 illustrate the testing loss
for learning the four primary tasks, along with their 5-th auxiliary task, i.e. 𝒯𝑖5 . It is observed
in Figure 3: i) there is no significant difference between the four approaches for 𝒯1 ; ii) MTL
and PCL-lin drop slightly slower compared to STL and PCL-exp for 𝒯2 ; iii) both PCL approach
drop slower compared with STL and MTL for 𝒯3 ; iv) MTL, PCL-lin, and PCL-exp drops faster
Table 2
Comparison of test loss after the 200 epochs for different MCL approaches using different auxiliary
tasks. The reference performances (using STL) are placed in parentheses. MCL results outperforming
the baseline are highlighted in bold, and the best performance for each subsystem is underlined.

Task1 (0.6861 ± 0.0713) MTL SeqL PCL-lin PCL-exp
AuxTask1 0.6829 ± 0.0707 0.692 ± 0.072 0.6827 ± 0.0727 0.6784 ± 0.0672
AuxTask2 0.6948 ± 0.0736 0.7037 ± 0.0749 0.7248 ± 0.0702 0.7076 ± 0.0709
AuxTask3 0.6812 ± 0.0744 0.6969 ± 0.0634 0.698 ± 0.0762 0.6943 ± 0.0774
AuxTask4 0.6917 ± 0.0775 0.6934 ± 0.069 0.7058 ± 0.0634 0.6881 ± 0.0684
AuxTask5 0.6968 ± 0.0712 0.6894 ± 0.0735 0.6913 ± 0.074 0.6862 ± 0.073
Task2 (0.4374 ± 0.1821) MTL SeqL PCL-lin PCL-exp
AuxTask1 0.4553 ± 0.2035 0.4199 ± 0.1608 0.4277 ± 0.1788 0.4285 ± 0.177
AuxTask2 0.4322 ± 0.1671 0.4448 ± 0.1766 0.4319 ± 0.1676 0.4329 ± 0.1678
AuxTask3 0.4109 ± 0.1556 0.4427 ± 0.1782 0.4105 ± 0.1398 0.3929 ± 0.1602
AuxTask4 0.4105 ± 0.1632 0.4436 ± 0.1673 0.4699 ± 0.1928 0.4362 ± 0.1752
AuxTask5 0.4702 ± 0.2093 0.4734 ± 0.2158 0.4874 ± 0.2344 0.4632 ± 0.2307
Task3 (0.3827 ± 0.0551) MTL SeqL PCL-lin PCL-exp
AuxTask1 0.3777 ± 0.0593 0.377 ± 0.0627 0.3829 ± 0.0609 0.3774 ± 0.0577
AuxTask2 0.387 ± 0.0503 0.3901 ± 0.0618 0.401 ± 0.0534 0.3892 ± 0.0659
AuxTask3 0.3859 ± 0.0534 0.3857 ± 0.0578 0.3868 ± 0.0588 0.3766 ± 0.0684
AuxTask4 0.3847 ± 0.0683 0.3828 ± 0.0563 0.4021 ± 0.0724 0.3865 ± 0.0602
AuxTask5 0.3874 ± 0.0587 0.3868 ± 0.0605 0.4016 ± 0.0563 0.38 ± 0.0549
Task4 (0.4166 ± 0.0679) MTL SeqL PCL-lin PCL-exp
AuxTask1 0.4155 ± 0.065 0.4783 ± 0.0698 0.4332 ± 0.0801 0.3986 ± 0.0567
AuxTask2 0.4085 ± 0.0745 0.4786 ± 0.0853 0.5225 ± 0.1217 0.4339 ± 0.0813
AuxTask3 0.424 ± 0.0492 0.5029 ± 0.0554 0.4253 ± 0.054 0.4874 ± 0.1068
AuxTask4 0.4251 ± 0.0774 0.4548 ± 0.0767 0.4936 ± 0.0718 0.4375 ± 0.0664
AuxTask5 0.4291 ± 0.0613 0.4442 ± 0.0662 0.4535 ± 0.0651 0.4347 ± 0.0716

compared to STL.
Table 3 shows a comparison between MCL methods on the convergence time to the reference
point (computed based on STL mean testing losses over the four folds). It is observed that: i)
MTL outperforms STL in all four primary tasks, and converged to the reference point faster than
other approaches in three out of four primary tasks; ii) the performance of PCL-lin achieved
converged fast for two of the tasks; iii) PCL-exp achieved better performance compared to
PCL-lin, with overall short convergence time. The result corresponding to SeqL is particularly
interesting. Although a 𝒩𝑒𝑝 of 100 epochs is adopted for SeqL (i.e. trained on one of the auxiliary
tasks for the first 100 epochs before learning the primary task), the testing loss converges to the
reference point within 10 epochs in the majority of the cases. From an empirical perspective,
the proposed auxiliary tasks assisted the learning (of the models) for the primary task, resulting
Task 1 Task 2 Task 3 Task 4
0.82 0.75 0.60 0.75
STL STL STL STL
MTL MTL MTL MTL
0.80 PCL-Lin 0.70 PCL-Lin PCL-Lin 0.70 PCL-Lin
PCL-exp PCL-exp 0.55 PCL-exp PCL-exp
0.65 0.65
0.78
0.60 0.50 0.60
0.76
0.55 0.55
0.74
0.50 0.45 0.50
0.72
0.45 0.45
0.40
0.70 0.40 0.40

0.68 0.35 0.35 0.35
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Epoch Epoch Epoch Epoch

Figure 3: Comparison of convergence speed for different MCL approaches and auxiliary tasks.

in a faster convergence time to the reference point.
Table 3
Comparison of convergence speed to reach a point beyond which the gain from further training is limited.
The reference point is given by STL loss sequences averaged over 4 folds. MCL results outperforming
the baseline are highlighted in bold, and the best performance for each subsystem is underlined.

Task1 (20 Ep.) MTL SeqL PCL-lin PCL-exp
AuxTask1 13.0 ± 1.4142 102.25 ± 0.433 16.75 ± 2.2776 16.25 ± 8.9268
AuxTask2 32.5 ± 4.0311 125.75 ± 2.2776 103.5 ± 28.8141 45.75 ± 26.6962
AuxTask3 15.75 ± 3.8971 106.5 ± 1.118 29.0 ± 3.3912 22.25 ± 8.1968
AuxTask4 10.5 ± 1.5 106.5 ± 2.2913 70.5 ± 32.7605 29.0 ± 15.1493
AuxTask5 15.25 ± 6.7961 102.5 ± 0.5 18.0 ± 4.3012 19.0 ± 9.083
Task2 (19 Ep.) MTL SeqL PCL-lin PCL-exp
AuxTask1 15.625 ± 7.9047 58.875 ± 50.5827 34.0 ± 22.7211 20.125 ± 14.4606
AuxTask2 18.0 ± 9.2736 108.25 ± 9.8075 44.5 ± 14.239 23.5 ± 11.4127
AuxTask3 26.0 ± 16.4773 106.75 ± 7.9804 42.5 ± 12.0312 22.75 ± 7.1545
AuxTask4 32.0 ± 33.2039 111.25 ± 13.3112 60.3333 ± 21.6384 32.0 ± 18.8149
AuxTask5 17.0 ± 11.2916 108.5 ± 8.6168 100.0 ± 86.0145 28.0 ± 24.8697
Task3 (14 Ep.) MTL SeqL PCL-lin PCL-exp
AuxTask1 9.25 ± 5.4025 104.5 ± 3.5707 22.5 ± 20.6458 10.25 ± 4.8153
AuxTask2 8.5 ± 7.7298 58.75 ± 50.5118 36.125 ± 38.6505 21.875 ± 19.3354
AuxTask3 9.75 ± 7.9175 105.75 ± 3.6997 37.25 ± 20.216 18.0 ± 8.2765
AuxTask4 11.25 ± 7.5291 105.75 ± 3.2692 55.75 ± 36.8536 24.5 ± 11.9478
AuxTask5 10.0 ± 8.6891 103.75 ± 3.0311 36.25 ± 25.4497 22.25 ± 12.8331
Task4 (19 Ep.) MTL SeqL PCL-lin PCL-exp
AuxTask1 14.75 ± 2.1651 107.5 ± 3.3541 25.5 ± 2.958 21.5 ± 1.118
AuxTask2 15.75 ± 4.867 106.0 ± 1.5811 28.5 ± 3.8406 20.75 ± 2.3848
AuxTask3 14.75 ± 2.9896 61.375 ± 47.7178 18.875 ± 8.1 17.1429 ± 4.4538
AuxTask4 16.5 ± 5.6789 107.5 ± 1.5 41.5 ± 22.0057 24.0 ± 13.6015
AuxTask5 15.0 ± 4.3589 105.5 ± 3.2016 14.75 ± 3.562 14.0 ± 4.3589

6. Conclusion and Future Work
In this work-in-progress paper, several multi-task curriculum learning strategies were evaluated
for forecasting the energy consumption of auxiliary subsystems in heavy-duty electric vehi-
cles. The preliminary results show that progressive continual learning has achieved the best
performance (lowest averaged MSE) compared to multi-task learning with any CL, Sequential
CL, and the traditional approach (STL). Moreover, the proposed auxiliary tasks, based on key
consumption characteristics, for CL have shown to be useful for solving all four primary tasks,
both in terms of regression error as well as convergence speed.
Future works include: (i) developing methods to rank/order the auxiliary tasks based on
relevancy and select the top 𝑘 number of tasks for CL; (ii) proposing adaptive methods for
governing the learning process, e.g. based on weighting the losses of auxiliary tasks using
learning dynamics; (iii) enabling CL across primary tasks, based on task relevancy.

Acknowledgments
The work was carried out with support from the Knowledge Foundation and Vinnova (Sweden’s
innovation agency) through the Vehicle Strategic Research and Innovation Programme FFI.

References
[1] P. Soviany, R. T. Ionescu, P. Rota, N. Sebe, Curriculum learning: A survey, International
Journal of Computer Vision 130 (2022) 1526–1565.
[2] Y. Zhang, Q. Yang, A survey on multi-task learning, IEEE Transactions on Knowledge and
Data Engineering 34 (2021) 5586–5609.
[3] Z. Chen, V. Badrinarayanan, C.-Y. Lee, A. Rabinovich, Gradnorm: Gradient normalization
for adaptive loss balancing in deep multitask networks, in: International conference on
machine learning, PMLR, 2018, pp. 794–803.
[4] A. Pentina, V. Sharmanska, C. H. Lampert, Curriculum learning of multiple tasks, in:
Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp.
5492–5500.
[5] S. Siahpour, X. Li, J. Lee, A novel transfer learning approach in remaining useful life
prediction for incomplete dataset, IEEE Transactions on Instrumentation and Measurement
71 (2022) 1–11.
[6] J. Shi, X. Yin, Y. Wang, X. Liu, Y. Xie, Y. Qu, Progressive contrastive learning with
multi-prototype for unsupervised visible-infrared person re-identification, arXiv preprint
arXiv:2402.19026 (2024).
[7] V. Satopaa, J. Albrecht, D. Irwin, B. Raghavan, Finding a" kneedle" in a haystack: Detecting
knee points in system behavior, in: 2011 31st international conference on distributed
computing systems workshops, IEEE, 2011, pp. 166–171.
[8] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,
L. Antiga, A. Lerer, Automatic differentiation in pytorch (2017).