<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>arXiv: Learning (2021). URL: https://dblp.uni-
trier.de/db/journals/corr/corr2109.html#abs</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.cie.2018.08.025</article-id>
      <title-group>
        <article-title>Development and evaluation of an adaptive routing algorithm for C2C logistics⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Danylo Kovalenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Zamrii</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>State University of Information and Communication Technologies</institution>
          ,
          <addr-line>Solomyanska Street 7, 03110, Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2004</year>
      </pub-date>
      <volume>9</volume>
      <fpage>406</fpage>
      <lpage>411</lpage>
      <abstract>
        <p>Modern C2C logistics systems face challenges related to demand variability, limited resources, and the need for adaptive routing. This study presents an approach to developing an adaptive routing methodology based on the combination of reinforcement learning (RL) and machine learning (ML) for demand prediction. In the first stage of the research, an algorithm was developed that utilizes RL agents for dynamically determining optimal routes and LSTM networks for demand forecasting. Simulation testing demonstrated improvements in key metrics, including reduced delivery time and increased resource utilization efficiency. Currently, an experimental validation of the algorithm in real-world conditions is being conducted, with its results to be used for formalizing the methodology. The obtained data is expected to contribute to the development of a universal adaptive routing methodology, enhancing the flexibility and efficiency of C2C logistics.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;C2C logistics</kwd>
        <kwd>adaptive routing</kwd>
        <kwd>reinforcement learning</kwd>
        <kwd>demand forecasting</kwd>
        <kwd>multi-agent systems</kwd>
        <kwd>lastmile logistics</kwd>
        <kwd>route optimization</kwd>
        <kwd>digital twins</kwd>
        <kwd>cloud computing</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        C2C (Customer-to-Customer) logistics plays a crucial role in modern e-commerce by enabling fast
and convenient deliveries between private individuals. Unlike traditional B2C logistics, where the
delivery process is centralized, C2C models are characterized by high dynamism, uneven resource
distribution, and fluctuating demand [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As a result, routing problems arise that cannot be
effectively solved using static methods, since the system's state changes rapidly. Studies show that
classical shortest-path algorithms, such as Dijkstra’s algorithm or A*, perform well in fixed road
networks but fail to efficiently adapt routes to changes in traffic and demand distribution [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        One approach to overcoming these limitations is the use of Reinforcement Learning (RL), which
enables agents to make real-time decisions based on historical and current system data. RL is
applied to routing problems where decisions need to be adapted to changing environmental
conditions, such as traffic congestion, adverse weather, or demand fluctuations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], it was
demonstrated that Q-learning-based algorithms provide more optimal routes for dynamic logistics
systems compared to traditional methods. Additionally, it was found that deep neural networks
(Deep Q-Network, DQN) significantly enhance the adaptability of RL models, but their
effectiveness depends on data quality and the accuracy of future state predictions.
      </p>
      <p>A critical component of successful RL application is the ability to forecast future system load.
The use of deep neural networks, such as Long Short -Term Memory (LSTM), enables the
incorporation of temporal dependencies in input data, allowing for the prediction of new order
placements in different locations [5]. The study in [6] demonstrated that combining demand
forecasting with routing algorithms can significantly improve the efficiency of logistics systems by
reducing the number of "empty" trips and optimizing courier workload. This is particularly
relevant for urban logistics platforms, where demand distribution can change rapidly.</p>
      <p>This research focuses on developing an adaptive routing methodology for C2C logistics by
integrating reinforcement learning and demand forecasting. At the first stage, an algorithm was
designed to dynamically determine routes based on environmental variables. Its effectiveness was
evaluated in a simulation environment, where it showed a significant reduction in deviation from
the optimal route, as well as improvements in delivery time and transport network utilization [7].
However, it was also found that the algorithm incurs higher computational costs, which could be a
limiting factor when scaling to real-world logistics platforms.</p>
      <p>Currently, the algorithm is undergoing real-world testing to assess its performance when
working with incomplete or noisy input data, as well as its ability to adapt to dynamic transport
conditions. The results obtained are expected to refine the model and formulate a generalized
methodology suitable for implementation in real logistics platforms [8].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Research Motivation and Problem Statement</title>
      <p>
        With the growth of C2C logistics, traditional routing methods are losing their effectiveness due to
the dynamic nature of demand, unpredictability of traffic conditions, and limited resources.
Algorithms such as Dijkstra’s algorithm or A* perform well in environments with static road
networks but fail to account for real-time changes in the transportation system [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In urban
logistics, where congestion, road closures, and fluctuating delivery requests are common, classical
methods cannot quickly adapt routes, leading to increased order fulfillment time and higher
logistics costs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        One promising approach to addressing this challenge is the use of Reinforcement Learning (RL),
which enables dynamic delivery routing by adapting to environmental changes. RL agents learn
from historical and real-time data, leveraging a Markov Decision Process (MDP) model, where each
system state depends on previous actions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Studies show that Deep Q-Networks (DQN) improve
routing efficiency by implementing an adaptive strategy for selecting optimal routes that considers
real-time traffic conditions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, RL performance heavily depends on the accuracy of input
data, particularly in forecasting future demand and courier distribution across the city.
      </p>
      <p>To address this challenge, deep neural networks such as Long Short -Term Memory (LSTM) are
employed, which can analyze temporal dependencies and predict logistics system fluctuations [5].
The combination of RL and LSTM enables not only real-time adaptation but also the anticipation of
future delays and demand surges, which is crucial for stabilizing logistics operations. Research in
this field indicates that integrating predictive models reduces overall system load, balances order
distribution, and minimizes courier downtime [6].</p>
      <p>As part of this study, a system combining RL agents for optimal decision-making and LSTM
networks for demand forecasting was developed. Initial testing in a simulation environment
demonstrated that the adaptive approach significantly reduces route deviation compared to
traditional methods. Depending on the scenario, deviations ranged from 9.2% to 3.3%, whereas for
classical algorithms, this metric reached 31.7% in heuristic-based methods. At the same time, the
adaptive algorithm required higher computational resources, with an average route computation
time of 50–60 ms, which was 2–3 times higher than that of traditional methods. Additionally, an
increased frequency of route recalculations was observed, reaching 18 updates in the most complex
scenarios, potentially leading to computational system overload.</p>
      <p>The next phase of the research involves validating the algorithm in real-world conditions,
allowing for an assessment of its performance when handling incomplete or noisy data and
evaluating its robustness against sudden changes in the logistics system. The results of real-world
experiments are expected to refine the model and establish a generalized adaptive routing
methodology for C2C logistics, making it suitable for scalability and integration into large logistics
platforms.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Developed Algorithm and Its Simulation-Based Evaluation</title>
      <p>
        The proposed routing algorithm is designed to enable adaptive route management for couriers in
C2C logistics. The primary goal is to develop a system capable of dynamically adjusting routes in
response to changes in demand and the state of the transportation network. To achieve this, the
algorithm integrates Reinforcement Learning (RL), which allows agents to make optimal decisions
based on both current and anticipated changes in the environment, and Long Short-Term Memory
(LSTM) for predicting the future distribution of orders [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The overall structure of the algorithm consists of two key components:
•
•
demand forecasting – utilizing LSTM to estimate the probability of new order appearances
in different city zones. This enables proactive courier redistribution, preventing overload or
idle time [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ];
reinforcement learning-based routing – an RL agent is trained to select optimal routes
considering not only the current network state but also predicted changes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        This approach minimizes delivery delays, improves courier workload balance, and reduces
overall route length. Previous studies have shown that RL-based traffic flow management can
reduce average delivery time by 15–20% compared to classical routing methods [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Additionally,
LSTM-based demand forecasting decreases the number of empty trips and contributes to a more
balanced load distribution across the transportation system [5].
      </p>
      <p>Expected Benefits of the Algorithm:
•
•
•
flexibility – the ability to adapt to changes in the transportation environment in real time;
resource optimization – balancing workloads among couriers and reducing idle time;
reduction in delivery time – through dynamic route adjustments.</p>
      <p>At the same time, the use of RL-based methods may lead to increased computational costs, as
the algorithm requires significant resources for training and decision-making. This issue has been
highlighted in real-time route optimization studies, emphasizing the need to balance prediction
accuracy and computation speed [6, 7]. Another critical aspect is algorithm scalability as the
number of delivery requests increases, which requires further investigation [8].</p>
      <p>Thus, the proposed approach aims to strike a balance between routing efficiency, computational
speed, and adaptability. The next subsection presents its mathematical model and formal
algorithmic principles.</p>
      <sec id="sec-3-1">
        <title>3.1. Mathematical Model of the Algorithmу</title>
        <p>
          The proposed routing algorithm integrates Reinforcement Learning (RL) for adaptive route
management and Long Short-Term Memory (LSTM) for demand forecasting. Its operation can be
formalized as a Markov Decision Process (MDP), where each action affects the future state of the
system and the agent’s reward [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          The algorithm operates in two stages:
1. Demand forecasting using LSTM – predicts the future distribution of orders, enabling
proactive route planning [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
2. Adaptive decision-making by the RL agent – optimizes routes in real-time based on
updated data and demand forecasts [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.1. Formalization of the Learning Process</title>
        <p>The system is modeled as an MDP (Markov Decision Process), defined as a five-tuple
where:
• – the set of states, including courier locations, active orders, and the current state of the
transportation network;
• – the set of actions available to the agent (e.g., assigning an order to a courier or
modifying a route);
• – the probability of transitioning between states based on the selected action;
• – the reward function that determines the efficiency of the chosen route;
• – the discount factor that regulates the long-term optimization of decision.</p>
        <p>At each step, the agent receives a reward which defines the effectiveness of the routing
process:
— is the learning rate, and
represents the best expected value for the new</p>
        <p>The Q-function is used to estimate the quality of an action
follows the rule:
in state . The Q -value update
where
state .</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.1.2. Demand Forecasting</title>
        <p>An LSTM network is used to predict future demand based on historical data. This allows the
identification of potentially overloaded regions, enabling route adjustments before imbalances
occur [5].</p>
        <p>The model uses the parameters which
determine the behavior of the LSTM. The state update in the LSTM model is formalized as follows:
(1)
(2)
(3)
(4)
(5)
(6)
(7)
where , , are sigmoid activation functions that regulate the flow of information between
memory states [6]. Where is the input gate and is the descending gate, which depend on the
activation function . The memory state is updated according to formula (10), where is means
element-wise multiplication. The output gate is calculated according to formula (5), and the hidden
state is updated according to (7).</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.1.3. Optimization of the Learning Process</title>
        <p>To evaluate forecasting accuracy, the loss function is computed as the mean squared error between
the predicted values and the actual data [7]:</p>
        <p>The model is trained using gradient descent, where parameter updates follow the rule [8]:
where is the learning rate, and is the gradient of the loss function with respect to the
parameters.</p>
        <p>The training process is repeated for each epoch until the specified number of is reached
or the loss function stabilizes. After completing all epochs, the optimized parameters are
determined as follows [9]:
(8)
(9)
(10)</p>
        <p>Ultimately, the algorithm generates a demand forecast , which is used to update the system
state and optimize routing decisions. The proposed approach is expected to reduce route deviations
from optimal values, decrease average delivery time, and enhance system adaptability.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.2. Results of Experiments in a Simulation Environment</title>
        <p>Evaluating the efficiency of adaptive routing algorithms is a critical step in their validation before
deployment in real logistics systems. Previous studies have shown that combining reinforcement
learning (RL) with demand forecasting significantly improves the efficiency of urban logistics
platforms. In [9], it was noted that RL -based algorithms enable flexible decision-making, which
enhances the utilization of transportation resources in urban environments. The study in [10]
demonstrated that integrating RL with optimization methods reduces overall delivery delays by
13%.</p>
        <p>At the same time, as highlighted in [11], traditional routing methods, such as Vehicle Routing
Problem (VRP) algorithms, are significantly less effective in scenarios with highly variable demand.
In [12], it was demonstrated that LSTM-based demand forecasting reduces the number of empty
trips, leading to increased overall logistics network efficiency.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.2.1. Testing Methodology</title>
        <p>In this study, the developed algorithm was tested in a simulation environment that mimics an
urban transportation network. The simulator models dynamic delivery demand, changing road
conditions, and courier behavior variability. The primary objective of the testing was to evaluate
the algorithm's ability to adapt routes in real time and compare its effectiveness with traditional
routing methods. The Python platform with AnyLogic libraries for visualization and calculations
was used. Real-world city maps, historical demand data, transportation routes, and traffic served as
the basis for the simulations. The test scenarios covered static and dynamic delivery conditions
that took into account factors such as traffic, weather conditions, and variable demand. The
simulations involved three types of agents: couriers, orders, and vehicles, which could adapt their
behavior depending on the state of the environment.</p>
        <p>Three different approaches were considered:</p>
        <p>Traditional heuristic algorithm – constructs routes based on the shortest path without
accounting for real-time changes in the transportation system.</p>
        <p>Vehicle Routing Problem (VRP) optimization – a classical approach to optimizing route
distribution among couriers.</p>
        <p>Adaptive method (LSTM + RL) – the proposed algorithm, which integrates demand
forecasting and RL-based adaptive learning for dynamic route adjustment.</p>
        <p>The efficiency of the algorithms was assessed based on several key metrics. In particular, the
average route length was analyzed, the reference value of which was taken as 12 km, which was
obtained from the optimal solution of the VRP (Vehicle Routing Problem) and was consistent with
real-world urban logistics data.</p>
        <p>The experimental results are presented in Table 1.</p>
        <p>The following metrics were used to assess efficiency: delivery time (s) – the average time
required to complete an order; average route computation time (ms) – the speed of route
recalculations; number of route recalculations – the frequency of route updates during delivery
execution; route length (km) – the total distance traveled by the courier while completing an order;
deviation from the reference route (%) – the extent to which the actual route deviates from the
theoretically optimal route.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.2.2. Analysis of the Obtained Results</title>
        <p>The obtained results demonstrate that the adaptive approach significantly outperforms traditional
routing methods across all key performance indicators. In particular, using LSTM for demand
forecasting reduced the average delivery time by 20–25% compared to the heuristic method and by
13–15% compared to VRP. Additionally, the lowest deviation from the reference route was
achieved—ranging from 9.2% in the initial scenario to 3.3% with a higher number of route updates.</p>
        <p>However, as noted in [13], using RL in urban logistics tasks incurs significant computational
costs. In our experiment, the average route computation time for the adaptive approach was 50–60
ms, which is 2–3 times higher than that of traditional methods. Furthermore, the number of route
recalculations increased, reaching 18 updates in complex scenarios, exceeding the acceptable
threshold of 15 updates. This may indicate a risk of excessive route recomputation in high-demand
scenarios, which is also confirmed in [14].</p>
        <p>These findings highlight the potential of the adaptive approach for real-world urban logistics
systems. However, to enable full-scale implementation, route recomputation processes must be
optimized, and computational overhead must be reduced, which will be the focus of future
research. The next step is experimental testing of the algorithm in real-world conditions, allowing
for an assessment of its resilience to unpredictable changes in the urban transportation network.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Validation in a Real-World Environment</title>
      <p>Following the positive results obtained in the simulation environment, the next stage involved
testing the algorithm under real-world urban logistics conditions. Field experiments are crucial, as
real environments introduce additional factors that are difficult to simulate, such as incomplete or
noisy data, unpredictable traffic variations, and fluctuating courier behavior [15].</p>
      <p>Previous research has shown that reinforcement learning (RL) methods can be effectively
applied to real transportation systems, but their performance heavily depends on the quality of
input data. In [16], it was demonstrated that dynamic route adaptation in real-time can reduce
average delivery time by 10–15%, even in cases of inaccurate demand predictions.</p>
      <sec id="sec-4-1">
        <title>4.1. Experimental Validation Methodology</title>
        <p>The real-world trials were conducted using an operational C2C logistics platform in a major city.
The algorithm was integrated into the courier management system, where couriers received
updated routes through a mobile application. To monitor the algorithm’s performance, GPS
trackers were used to track courier movements, along with an analytics system that collected
realtime delivery execution data.</p>
        <p>During the experiment, the algorithm was tested in two modes:
1. Static routing (baseline approach) – routes were generated at the beginning of the workday
and remained unchanged.
2. Adaptive routing (LSTM + RL) – the algorithm dynamically adjusted routes based on
environmental changes and predicted demand.</p>
        <p>The following metrics were used to evaluate performance:
•
•
•
аverage delivery time – time from order acceptance to final delivery;
number of deviations from the planned route – measures how often routes were modified
during order execution;
resource utilization rate – indicates how evenly workloads were distributed among
couriers.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Preliminary Testing Results</title>
        <p>It is expected that the results of real-world experiments will follow similar trends observed in
simulation-based trials, where the adaptive algorithm demonstrated reduced average delivery time
and route deviation. However, as noted in [17], real -world traffic conditions can introduce
significant performance variations due to external factors such as weather conditions, traffic
accidents, or other unforeseen disruptions</p>
        <p>In previous studies [15], it was highlighted that the efficiency of RL-based algorithms in
realworld environments depends on model update speed, which can become a critical factor in large
scale systems with high request volumes. In our case, the experiment focuses on balancing routing
adaptation accuracy and computational costs, as excessive route recalculations in real-world
scenarios may slow down logistics system operations.</p>
        <p>The next step after data collection and analysis will be algorithm parameter optimization and
the development of a generalized methodology for implementing the approach in scalable C2C
logistics systems.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Formalization of the Methodology</title>
      <p>After conducting real-world experimental trials, the next step is to develop a generalized adaptive
routing methodology for C2C logistics. The formalization of this methodology will be based on
analyzing both simulation and field test results, particularly in terms of demand prediction
accuracy, algorithm stability, and computational efficiency.</p>
      <p>One of the key challenges is finding the right balance between routing accuracy and
computational overhead. High algorithm adaptability, which minimizes deviation from the
reference route, is associated with a significant increase in the number of recalculations. This
highlights the need to develop criteria for dynamically adjusting route update frequency, which
will reduce computational load without significantly compromising efficiency.</p>
      <p>Data quality dependency and sensitivity to training parameters also affect the overall
performance of the system. To overcome these challenges, it is advisable to implement cloud
computing and sensor networks to collect relevant data in real time.</p>
      <p>Further development of the methodology may include the implementation of multi-agent
systems for courier coordination, the integration of hybrid forecasting models, and the
development of lightweight RL algorithms aimed at small platforms.</p>
      <p>The methodology will also account for possible model parameter variations based on request
density, urban infrastructure characteristics, and technological constraints of logistics platforms.
To address this, a flexible algorithm configuration system will be designed, allowing the model to
adapt to specific operational conditions.</p>
      <p>The results of this formalization will be used for further implementation of the methodology in
real logistics systems, as well as for developing recommendations on scalability and optimization in
high-load transportation networks.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>This study explored an adaptive routing approach for C2C logistics based on reinforcement
learning (RL) and demand forecasting. The developed algorithm was tested in a simulation
environment, allowing an evaluation of its effectiveness compared to traditional methods. The
results demonstrated a significant reduction in deviation from the reference route, indicating high
adaptability of the approach. However, an increase in computational costs and the number of route
recalculations was observed, which could be a critical factor in system scalability.</p>
      <p>Currently, the experimental validation of the algorithm in a real-world environment is ongoing.
Field trials are expected to assess the actual impact of the algorithm on delivery time, routing
stability, and resource utilization efficiency. The obtained data will be used to further optimize the
algorithm and formalize a generalized adaptive routing methodology.</p>
      <p>Future research will focus on reducing the algorithm’s computational costs, improving route
update strategies, and testing the approach on various logistics platforms. Additionally, a dynamic
parameter tuning mechanism will be developed to adjust the algorithm’s configuration based on
system load and external factors.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Chow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. H.</given-names>
            <surname>Kuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <article-title>Reinforcement Learning for Logistics and Supply Chain Management: Methodologies, State of the Art, and Future Opportunities</article-title>
          . Transportation Research Part E: Logistics and
          <string-name>
            <given-names>Transportation</given-names>
            <surname>Review</surname>
          </string-name>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1016/j.tre.
          <year>2022</year>
          .
          <volume>102712</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <article-title>A Deep Reinforcement Learning-Based Algorithm for Multi-Objective Agricultural Site Selection and Logistics Optimization Problem</article-title>
          .
          <source>Applied Sciences</source>
          <volume>14</volume>
          (
          <year>2024</year>
          )
          <article-title>8479</article-title>
          . doi:
          <volume>10</volume>
          .3390/app14188479.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Lin, Solving the Vehicle Routing Problem with Stochastic Travel Cost Using Deep Reinforcement Learning</article-title>
          .
          <source>Electronics</source>
          <volume>13</volume>
          (
          <year>2024</year>
          )
          <article-title>3242</article-title>
          . doi:
          <volume>10</volume>
          .3390/electronics13163242.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>O. I. Akinola</surname>
          </string-name>
          ,
          <article-title>Adaptive location-based routing protocols for dynamic wireless sensor networks in urban cyber-physical systems</article-title>
          .
          <source>Journal of Engineering Research and Reports</source>
          ,
          <volume>26</volume>
          (
          <issue>7</issue>
          ) (
          <year>2024</year>
          )
          <fpage>424</fpage>
          -
          <lpage>443</lpage>
          . doi:
          <volume>10</volume>
          .9734/jerr/2024/v26i71220.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>