1. Introduction

August

Parked Vehicles Assisted Task Ofloading Based on Deep Reinforcement Learning

Guangting Lu

Zhuojun Lv

Zheng Zhang

Feng Zeng

0 0 School of Computer Science and Engineering, Central South University , Changsha 410083 , China

2024

1 9 22

As demand continues to grow, edge servers are increasingly constrained by their limited com-puting resources. In addition, large-scale deployment of edge servers will inevitably lead to unnecessary waste of resources. To further expand the resources of Vehicular Edge Computing, in this paper, we point out that the idle resources of parked vehicles can be integrated to assist edge servers in processing ofload tasks and propose a computing ofloading framework for parking cluster collaboration. In this framework, the computing task of each vehicle is composed of multiple subtasks that have dependencies between each other. To eficiently manage hetero-geneous resources in the framework, a layered ofloading method based on deep reinforcement learning is proposed to minimize the average completion time of all vehicles. Simulation results show that the proposed method has better performance than the other three baseline methods in terms of task processing time and task execution success rates.

eol>Edge Computing Deep Reinforcement Learning Dependent Task Ofloading Parked Vehicles

1. Introduction

In Vehicular Edge Computing, due to the very limited computing and storage resources, vehicles are often unable to process some computationally intensive and latency-sensitive intelligent applications locally, such as digital twins, augmented reality, etc. [ 1 ]. As a complementary solution, cloud computing meets the service needs of some computing-intensive tasks by ofloading applications to cloud servers for execution [ 2 ]. However, cloud servers are usually located in data centers far away from vehicles, which leads to higher bandwidth consumption and increased communication latency during task ofloading, thus afecting the performance of computation ofloading. To this end, Vehicular Edge Computing (VEC) came into being. Its idea is to provide highly reliable and low-latency computing ofload services to vehicle users by deploying edge servers on both sides of the road [ 3 ]. However, as demand continues to grow, edge servers are increasingly constrained by their limited computing resources. To optimize the resource allocation of a single-edge server, some scholars [ 4, 5, 6 ] have devoted themselves to taking one or more performance indicators as the optimization goal and modeling the computation ofloading problem as the best optimization model. However, as the number of requests for computing ofload services increases, the approach that simply relies on optimizing the resource configuration of a single-edge server still has the problem of insuficient resources. To this end, some scholars [ 7, 8 ] have explored the resource scheduling problem across multiple edge servers and achieved resource collaborative scheduling among multiple edge servers by building a resource load balancing model. However, due to the temporal and spatial diferences in the spatial distribution of vehicles, servers in the same area often face similar load pressures. Thus, migrating computing tasks to distant edge servers for collaborative processing may result in higher service delays.

Considering that idle computing resources near vehicles are ubiquitous and do not require additional deployment, some scholars have studied the use of neighboring vehicles to expand the capabilities and service scope of edge computing. Some works [ 9, 10 ] have studied how to use mobile vehicles as edge servers to assist in ofloading. However, the rapid movement of vehicles may cause frequent changes in communication channels and interruptions in task ofloading, which in turn afects the performance of computation ofloading. Liu et al. [ 11 ] investigated the availability of parked vehicles and pointed out that parked vehicles have the characteristics of dense distribution, long parking time, and fixed location, which can provide stable network connections and computing resources, making them potential computing devices in the infrastructure [ 12 ]. Based on this, Reis et al. [13] proposed adding parked vehicles as static nodes to VEC, forming the concept of parking assistance and developing it into a new type of hybrid network. To alleviate the computing pressure on edge servers, some scholars have studied how to use the computing and communication resources of parked vehicles to collaborate with the edge for computing ofloading. Kadhim et al. [ 14] integrated Software Defined Networks and fog computing, used parked vehicles as auxiliary nodes for fog computing, and proposed a load balancing mechanism. Pham et al. [15] studied partial computation ofloading in parked vehicle-assisted multi-access edge computing and used the subgradient method to optimize the ofloading ratio and resource allocation. Ma et al. [ 16] organized parked vehicles into parking clusters and theoretically proved the long-term stability of the number of vehicles in a parking cluster. Zhao et al. [17] organized parked vehicles into static service nodes in a scenario where edge infrastructure was limited and proposed a task ofloading algorithm based on reinforcement learning.

However, the scenarios considered in the above research on parking vehicle-assisted edge computing are too idealistic, and the main scenario considered is the collaborative ofloading problem between a single edge server and multiple unassociated parked vehicles. In addition, the dependencies of subtasks are not taken into account, which limits the potential of parallel processing in edge computing and makes it dificult to meet the needs for low-latency services. In the face of the shortcomings and challenges of existing research, the main contributions of this work are summarized as follows: • We propose to integrate parked vehicles into parking clusters and design a dependent task computation ofloading framework for multiple parking clusters to collaborate with a single edge server. • We propose a deep reinforcement learning algorithm based on a multi-actor and singlecritic network architecture to minimize the average completion time of the application. Guided by a single critic network, multiple actor networks eficiently divide the decision action space into two layers: the first layer determines the location of task execution (locally, on edge servers, or in parking clusters); the second layer selects the specific parked vehicle to execute the task. This approach not only reduces the action space that each actor network handles, but also significantly improves the overall performance and eficiency of the system.

The remainder of the paper is organized as follows: Section II shows the system model studied in this paper, including: scenario modeling, task modeling, computational modeling, and the formalization of the optimization objectives established in this paper. Section III introduces the Double Actor-Layered Deep Deterministic Policy Gradient (DALDDPG) algorithm. We first model the decision-making process of the scenario studied in this paper as a Markov decision process. Then, the network structure of the DALDDPG algorithm, the update method of each network, and the DALDDPG pseudocode are introduced. Section IV evaluates the efectiveness of the proposed algorithm by comparing it with existing algorithms. Finally, we conclude this paper in Section V.

2. System model

As illustrated in Figure 1, we consider a computation ofloading scenario involving multi-parked vehicles, multi-task vehicles, and a single VEC server. To efectively manage the resources of parked vehicles and facilitate task cooperation among them, we group parked vehicles into multiple Parking Vehicle Clusters (PVC) and designate a Cluster Management Vehicle (CMV) within each cluster. The primary responsibility of the CMV is to maintain basic information about the vehicles within the cluster and report this information to the Road Side Unit (RSU) of its area via V2I communication regularly. Considering that the communication cost of the CMV with exterior entities is usually greater than its communication cost within the cluster, we do not consider the communication delay between the CMV and other vehicles within the cluster. Furthermore, the CMV is regarded as a bridge for the entire cluster to communicate with the exterior, responsible for accurately forwarding messages to the targeted parked vehicles.

We posit that there are M PVCs on the road, denoted as {| = 1, 2, ..., }. In each , there are parked vehicles, where ,1 represents the CMV of the -th PVC, and , represents the -th vehicle in the -th PVC. The computing resource set for the parked vehicles in each is represented as {,| = 1, 2, ..., }, where , signifies the CPU clock frequency of the -th vehicle in the -th PVC. Moreover, there are task vehicles on the road, which can either connect to the RSU in their coverage range through V2I to access the VEC server, or connect to the PVC through V2V.

2.1. Task model

In this paper, we model the dependent subtask relationships derived from the application , generated by vehicle , as = (, ). Here, = {|0, 1, ..., , + 1} represents the + 2 subtasks of . Specifically, 0 and +1 represent the virtual entry and exit subtasks of , respectively. These two virtual tasks are established to ensure that can start and end on vehicle . Each edge within the set denotes a dependency relationship between subtasks of . Specifically, an edge (, ) ∈ indicates that the result from must be transmitted to before can commence its execution. The tuple {, , , .} is defined to characterize the -th subtask of , where is the input data volume for , is the output data volume resulting from executing , represents the number of CPU cycles required to execute , and . is the maximum tolerable delay for .

The set of computing devices, available for ofloading services within the communication range of the task vehicle, is denoted by = {0, 1, ..., , + 1}, where 0 represents the task vehicle itself, 1 to represent PVCs, and +1 represents the VEC server. The decision variable , is used to indicate whether subtask is ofloaded to the computing device . This variable can be defined as follows: {︃0 if subtask is no ofoaded to device ,

1 if subtask is ofoaded to device .

The decision variable ,, indicates whether subtask is executed on parked vehicle , in PVC . This variable can be defined as follows: ,, = {︃0 if subtask is executed n on parked vehicle , in , 1 if subtask is executed on parked vehicle , in . (1) (2)

2.2. Computational model

In this paper, we assume that the VEC server, parked vehicles, and local vehicles can only handle one subtask at a time and that each subtask can only be processed on a computing device.

2.2.1. Local computing model

When ,0 = 1, the task is executed locally. The local ready time, ,, for , is the time when all the predecessor tasks of have been executed and their results have been trans-mitted back to the local vehicle. , can be expressed as follows: , = m∈ax{ , + ,, } where is the set of all predecessor tasks of ; , refers to the completion time for on the designated computing device based on the ofloading decision; ,, is the time required to transmit the execution results of back to . When is ready locally, it may not immediately be scheduled for execution due to the need to account for local queuing execution times. The completion time of , when executed locally, is denoted as , and can be expressed as follows: (3) (4) (5) (6) (7) , = {,, ,} +

where , stands for the earliest possible scheduling time for the local execution of , and denotes the computing capacity of the local terminal.

2.2.2. VEC computing model

When ,+1 = 1, the task is carried out on the VEC server. The transmission delay for vehicle , when uploading data to the VEC server, can be represented as: ,, =

The ready time of on the VEC server, denoted as ,, comprises two components: the upload time for to the VEC server, and the time at which all precursor tasks of get completed and their results are delivered back. Therefore, , can be expressed as:

Once task is ready on the VEC server, it may not necessarily be immediately scheduled for execution due to the queuing execution time on the VEC server. The completion time of , when executed on the VEC server, is denoted as , and can be expressed as: , = {︀ ,, ,}︀ +

where , is the earliest possible scheduling time for on the VEC server, and denotes the computing capacity of the VEC server.

2.2.3. PVC computing model

When , = 1 (where ̸= 0 and ̸= + 1), task is executed on the parked vehicle within PVC . The transmission delay for vehicle n, when uploading data to , can be represented as: ,,, =

The ready time ,, for task on the parked vehicle within includes two parts: the time required to upload to PVC , and the time when all precursor tasks of have been completed and their results are delivered back. Therefore, ,, can be expressed as:

2.3. Problem formulation

The actual completion time for subtask application , denoted as ,, based on the current ofloading decisions, can be expressed as: = ,I+1

, = ,0 , + ,+1 , + ∑︁ , ,, 1

The actual completion time for is the actual completion time of the virtual exit subtask +1 and can be represented as:

Once task is ready on the parked vehicle in , it may not immediately be scheduled for execution due to the queuing execution time on the parked vehicle. The completion time ,, of , when executed on the parked vehicle in , can be expressed as: {︃ ,, = ,,, ∑︁ ,,,, 1 }︃ + ,, ∑︀1 , where ,, denotes the earliest scheduling time for to be executed on the -th parked vehicle in , and , represents the computing capacity of the -th parked vehicle within PVC .

The main objective of this work is to minimize the average completion time of system applications under the condition that each task is completed within its maximum tolerable delay. The optimization problem is formulated as follows: where 1 stipulates that each task can only be executed on a single computing device, and 2 ensures that the actual completion time of each application and its respective subtasks remains within their maximum tolerable delay.

3. Design of algorithm

Given that the above optimization problem is a complex mixed integer linear programming problem, traditional optimization algorithms struggle to efectively solve. Moreover, to eficiently manage heterogeneous resources in proposed scenario, we propose a layered task ofloading scheduling algorithm based on deep reinforcement learning with a multi-actor and single-critic network. For this purpose, we first model the ofloading scheduling process for dependent tasks as a Markov Decision Process (MDP). Below, we provide the formal expressions for the state space, action space, and reward function in the MDP.

State. At time t, the Actor1 network in the first layer is responsible for ofloading the subtasks of application to either the local, a PVC, or the VEC server. The local state 1, observed by the Actor1 network includes four main components: the position of the CMV, the available computing resources of parked vehicles, the sequence of tasks that have already been scheduled, and the collection of task priority sequences. Therefore, 1, can be abstractly defined as follows: 2, = {︁1,, 2,, . . . , ,, . . . , ,}︁ (13) (15) (16) (17) 1, = { , , _, _} (14)

The Actor2 network of the second layer is responsible for ofloading the subtasks of application to specific parked vehicles for execution. The local state 2, observed by the Actor2 network includes three main parts: the computing resources available of the parked vehicles, the processing time required for tasks pending in the compute queue of the parked vehicle, and the set of task priority sequences. Therefore, 2, can be abstractly defined as follows: , = {︀ , , _}︀

Actions. In the layered action space, for task , the first layer action space 1, that the Actor1 network can take is represented as:

1, = {0,, 1,, . . . , ,, . . . , ,+1} where 1, determines the allocation layer level of task ; if 0, = 1, is executed locally; if ,+1 = 1, is executed on the VEC server; if , = 1 (where ̸= 0 and ̸= + 1), is executed on the -th PVC. Based on the decisions of the first layer, the second layer action space 2, that the Actor2 network can take is defined as:

specific parked vehicle, and if , = 1, is executed on the -th parked vehicle. where 2, specifies that, within the layer determined by 1,,task is further ofloaded to a Rewards. After executing the joint action , = {︁1,, 2,}︁ under the global state , = expressed as follows: {1, ∪ 2,},the Agent receives an immediate reward from the environment, which can be = (− 1:) − (− 1:+1) () where (_1:) denotes the time spent on the subgraph of tasks that have been scheduled under state ,, and () represents the delay when all scheduled tasks are executed locally.

The Double Actor-Layered Deep Deterministic Policy Gradient (DALDDPG) algorithm comprises six networks: the Actor1 network, 1(1| 1); the Actor2 network, 2(2| 2); and their respective target networks, 1′(1| 1′) and 2′(2| 2′). Additionally, it includes a Critic network, (, |), and a corresponding target network, ′(, |′). In the decision-making process, the Actor1 and Actor2 networks independently make first-layer and second-layer decisions based on their local states. The Agent subsequently combines these two decisions (1 , 2 ) into a joint decision, , which is then executed. Following this execution, the global state, , and local states, 1 and 2 , move to the next state and provide the Agent with an immediate reward, . Then the Agent stores the single set of experience (, , +1, ) from the interaction with the environment in the sample pool. During the training phase, a batch of samples is periodically drawn from the experience pool, and the − values for each sample are calculated through the target network, ′(, |′).

− = + ′(︀ ,, ,|′︀) where , = 1′(︀ 1 ,| 1′︀) ∪ 2′(︀ 2 ,|

2′︀) , and is the discount factor.

In this paper, we minimize the loss function () using gradient descent, based on the Temporal Diference algorithm, to update the weight parameter of the Critic. () can be expressed as follows: () = =1 1 ∑︁(︀ − − (, |))︀ 2 where represents the number of samples drawn from the sample pool.

Under the evaluation of the , we employ gradient ascent to update the parameters of the Actor1 and Actor2 networks. The policy gradients are expressed as follows: ∇ 1 = ∇ 2 = 1 ∑︁ (︁ =1 1 ∑︁ (︁ =1 ∇(, |)∇ 1 1︁( | 1

︁) 1 ∇(, |)∇ 2 2︁( | 2

︁) 2 (18) (19) (20) (21) (22)

To update the parameters of the target network, a soft update strategy is employed. The updates for all target networks are expressed as follows: ⎧ 1′ = 1 + (1 − ) 1′ ⎪ ⎨

2′ = 2 + (1 − ) 2′ ⎪⎩′ = + (1 − )′ where is the soft update coeficient.

Algorithm 1 DALDDPG algorithm (23)

4. Simulation and result analysis

We use Python 3.7 and TensorFlow 2.0 for simulations. The simulation scenario considers a 400-meter road populated with ∈ [ 5, 30 ] task vehicles, alongside a VEC server, and ∈ [ 3, 6 ] PVCs, each containing ∈ [ 3, 7 ] parked vehicles. The computing capacities of the task and parked vehicles are (0, 0.5] and [ 1, 2 ] GHz, respectively. The VEC server has a computing capacity of 6 GHz. The sample pool capacity is 2000, and the batch size for sampling is 64.

To evaluate the performance of the proposed strategy, we compare the following three ofloading strategies: • RS: Tasks are randomly assigned to be executed on the vehicle locally, on the VEC server, or on a parked vehicle; • HUDQN: Tasks are executed according to the ofloading decisions made in the first layer;

• SLDQN: The two-layer ofloading decisions are consolidated into a single-layer framework.

As depicted in Figure 2, as the number of training epochs increases, the rewards under various learning rates increase and stabilize. To balance convergence speed with system stability, we adopt a learning rate of 6 × 10− 4 for model training.

As shown in Figure 3, the average completion time of each strategy escalates with the increase in the number of applications, yet the proposed strategy consistently exhibits the lowest completion time. Compared to the other three strategies, our strategy reduces the average completion time by 11.47%, 25.41%, and 51.01% on average, respectively. Figure 4 shows that as the number of PVCs increases, the average completion time for each strategy decreases, with the proposed strategy performing the best. Compared to the other three strategies, the proposed strategy reduces the average completion time by 15.42% to 26.58% on average. As depicted in Figure 5, with rising task computational complexity, the task completion rates for all strategies gradually decrease, but the proposed strategy maintains the highest completion rate. Compared to the other three strategies, the proposed strategy enhances the completion rate by an average of 42.52%, 13.21%, and 4.21%, respectively. The above improvements are because the proposed strategy has considered well the heterogeneity of parked vehicle resources and adopted a layered task ofloading scheduling framework to optimize task allocation, which significantly improves the performance and eficiency of the system.

5. Conclusion and future work

In this paper, we design a dependent task scheduling framework, which is composed of multiple parking clusters cooperating with a single edge server. In addition, we propose a deep reinforcement learning algorithm based on a multi-actor and single-critic network architecture to minimize the average completion time of the application. Simulation results show that the proposed algorithm has better performance than the other three baseline algorithms in terms of task processing time and task execution success rates. Future work will explore task ofloading and resource scheduling within a VEC system assisted by multi-parking clusters, while also considering the energy consumption cost of parked vehicles. of vehicles as the infrastructures, IEEE Transactions on Vehicular Technology 65 (2016) 3860–3873. [13] A. B. Reis, S. Sargento, O. K. Tonguz, Parked cars are excellent roadside units, IEEE

Transactions on Intelligent Transportation Systems 18 (2017) 2490–2502. [14] A. J. Kadhim, J. I. Naser, Proactive load balancing mechanism for fog computing supported by parked vehicles in iov-sdn, China Communications 18 (2021) 271–289. [15] X.-Q. Pham, T. Huynh-The, E.-N. Huh, D.-S. Kim, Partial computation ofloading in parked vehicle-assisted multi-access edge computing: A game-theoretic approach, IEEE Transactions on Vehicular Technology 71 (2022) 10220–10225. [16] C. Ma, J. Zhu, M. Liu, H. Zhao, N. Liu, X. Zou, Parking edge computing: Parked-vehicleassisted task ofloading for urban vanets, IEEE Internet of Things Journal 8 (2021) 9344– 9358. [17] H. Zhao, J. Hua, Z. Zhang, J. Zhu, Deep reinforcement learning-based task ofloading for parked vehicle cooperation in vehicular edge computing, Mobile Information Systems 2022 (2022) 9218266.

[1]

Peng ,

Shi ,

Jiang ,

Tu ,

Xu ,

Hua , A survey on in-vehicle time-sensitive networking , IEEE Internet of Things Journal 10 ( 2023 ) 14375 - 14396 .

[2]

Lu ,

Shi , Vehicle computing: Vision and challenges , Journal of Information and Intelligence 1 ( 2023 ) 23 - 35 .

[3]

Raeisi-Varzaneh ,

Dakkak ,

Habbal ,

B.-S.

Kim , Resource scheduling in edge computing: Architecture, taxonomy, open issues and future research directions , IEEE Access 11 ( 2023 ) 25329 - 25350 .

[4]

Zhou ,

Jiang ,

Liu ,

Li ,

V. C.

Leung , Deep reinforcement learning for energyeficient computation ofloading in mobile-edge computing , IEEE Internet of Things Journal 9 ( 2021 ) 1517 - 1530 .

[5]

Chen ,

Xing ,

Xiao ,

Xu ,

Tao , A drl agent for jointly optimizing computation ofloading and resource allocation in mec , IEEE Internet of Things Journal 8 ( 2021 ) 17508 - 17524 .

[6]

A. M.

Seid ,

G. O.

Boateng ,

Anokye ,

Kwantwi , G. Sun, G. Liu, Collaborative computation ofloading and resource allocation in multi-uav-assisted iot networks: A deep reinforcement learning approach , IEEE Internet of Things Journal 8 ( 2021 ) 12203 - 12218 .

[7]

Fan ,

Hua ,

Zhang ,

Su ,

Li ,

Tang ,

Wu , Y. Liu, Game-based task ofloading and resource allocation for vehicular edge computing with edge-edge cooperation , IEEE Transactions on Vehicular Technology 72 ( 2023 ) 7857 - 7870 .

[8]

Li ,

Xie ,

Yuan ,

Chen ,

Wan , Deep reinforcement learning for load balancing of edge servers in iov , Mobile Networks and Applications 27 ( 2022 ) 1461 - 1474 .

[9]

Xiong ,

Leng ,

Huang ,

Yuen ,

Y. L.

Guan , Intelligent task ofloading for heterogeneous v2x communications , IEEE Transactions on Intelligent Transportation Systems 22 ( 2020 ) 2226 - 2238 .

[10]

Fan ,

Su , J. Liu,

Li ,

Huang ,

Wu , Y. Liu, Joint task ofloading and resource allocation for vehicular edge computing based on v2i and v2v modes , IEEE Transactions on Intelligent Transportation Systems 24 ( 2023 ) 4277 - 4292 .

[11]

Liu ,

Lou , G. Chen,

Cao , Pva in vanets: Stopped cars are not silent , in: 2011 Proceedings IEEE INFOCOM, IEEE , 2011 , pp. 431 - 435 .

[12]

Hou ,

Li ,

Chen ,

Wu ,

Jin ,

Chen , Vehicular fog computing: A viewpoint