Intelligent Battery Management for aquatic drones
          based on Task Difficulty driven POMDPs

    Alberto Castellini∗ , Jason J. Blum, Domenico D. Bloisi, and Alessandro Farinelli

                     Verona University, Department of Computer Science,
                          Strada Le Grazie 15, 37134 Verona, Italy,
                              {name.surname}@univr.it


        Abstract. Autonomous drones typically have limited battery capacity, which
        presents a major challenge for persistent, long-term deployments. In the con-
        text of water quality monitoring, aquatic drones must navigate rivers and lakes
        to collect real-time data, where the battery consumption is heavily influenced by
        dynamic environmental factors such as flowing current and wind. Intelligent bat-
        tery management is a requirement for the success of these missions. We propose
        a formalization of the battery management problem in terms of Partially Observ-
        able Markov Decision Processes (POMDPs). We model the problem as a POMDP
        where the agent modulates the energy provided to its propellers. The drone can
        estimate the “difficulty” to traverse an interval by observing the environment.
        An important aspect of this formalization is the prediction of future intervals’
        difficulty values based on current observations by exploiting a priori geometric
        information, which we call Task Difficulty Propagation (TDP). We investigate
        variations of this approach and analyze related performance.


1    Introduction

Autonomous aquatic drones are increasingly used for water monitoring. A key factor
for the success of data acquisition campaigns is mission awareness, which is composed
of three main elements [2]: knowledge of mission objectives, internal self-situational
awareness, and external self-situational awareness. Due to the limited energy on-board
and the requirements for long-term autonomy and reliability, the ability to forecast
the mission-specific energy consumption is necessary to complete missions reliably,
smoothly, and efficiently. In this work we focus on battery management, which is an
aspect of self-situational awareness.
    Existing approaches for estimating the battery consumption in autonomous systems
can be divided into two main categories, namely decision [3,8,2] and predictive models
[4,11,9]. All these approaches deal with different aspects of a common problem: mak-
ing optimal decisions for battery management in uncertain environments. In this paper,
we propose a methodology based on Partially Observable Markov Decision Processes
(POMDPs) [6], a decisional model that combines the strength of Hidden Markov Mod-
els with that of Markov Decision Process by capturing both dynamics that depend on
unobserved states and effects of decisions over time.
    We apply our method to the following case study: given i) a path made of four
perpendicular segments (Figure 1.c), in which each segment has a specific degree of
difficulty (i.e., a measure of the forward speed achieved for a given amount of energy
consumption) and ii) an initial amount of battery energy, the agent has to make decisions
about the amount of engine power needed at each time instant to minimize the time
spent to reach the end of the path and maximize the probability to complete the mission
before battery depletion. To this end POMDPs are extended with a novel simulation
strategy called Task Difficulty Propagation (TDP) which is based on Markov Random
Fields (MRF), and allows to represent the dependencies between segment difficulties
and to propagate this information for improving the belief over the difficulties in the
entire path. First experiments show promising enhancements to mission performance.
Future developments aim at using the method on the INTCATCH [5] drones (Fig. 1.a).


Fig. 1: a) Intcatch system for autonomous monitoring of catchments; b) Real path from a water
monitoring campaign; c) Shape and cosine similarity matrices for a square path; d) POMDP
model for battery management in water monitoring drones.


2   Methods
A POMDP is a tuple (S, A, O, T, Z, R, γ), where S is a finite set of states, A is a finite
set of actions, Z is a finite set of observations, T : S × A → Π(S) is the state-transition
model, O: S × A → Π(Z) is the observation model, R: S × A → R is the reward
function and γ ∈ [0, 1) is the discount factor.
    The goal of an agent based on POMDPs is to maximize its expected total reward
      ∞
E[ t=0 γ t R(st , at )], by choosing the best action at in each state st at time t.
   P
    Figure 1.d shows the proposed POMDP for battery management as a graph, where
states are represented by circles, actions by rectangles and conditional dependencies
between states by arrows. The states are: i) battery voltage interval B ∈ {b0 , b1 , b2 , b3 },
where b0 represents empty battery, ii) speed interval V ∈ {v1 , v2 , v3 }, iii) distance to
path end D ∈ {d0 , d1 , ..., d8 }, where d8 is the starting interval of the path (i.e., the max-
imal distance) and d0 is the end point of the path (i.e., the goal), iv) time interval T ∈
{t1 , t2 , t3 , t4 , t5 }, where t1 is the time interval in which the path starts, v) current seg-
ment S ∈ {s1 , s2 , s3 , s4 }, vi) difficulty of segments F1 , F2 , F3 , F4 | Fj ∈ {L, M, H},
where L=low, M=medium, H=high, and difficulties may depend on water flow or other
environmental factors, vii) difficulty of current segment Fc ∈ {L, M, H}, viii) engine
power P ∈ {L, M, H}, which corresponds to a uniform discretization of the combined
effort of the drone’s propellers.
     States B, V, D, T and S are observable, while states F1 , . . . , F4 and Fc are hidden,
hence the model has a mixed observability [10]. Rewards depend on engine power (i.e.,
-0.01 for low power, -0.02 for medium power and -0.05 for high power), time of arrival
(i.e., +500 for arriving at time t5 , +1000 for arriving at time t4 , +1500 for arriving
at time t3 , +2000 for arriving at time t2 , +2500 for arriving at time t1 ) and battery
depletion before arrival (i.e., -5000). The discount factor is γ = 0.95. The five hidden
variables can assume three possible values each (i.e., L, M or H), for a total of 35 = 243
configurations of the hidden state (e.g., f = (L, M, H, L, L)). The belief bt provides the
probability over all possible configurations at time t. It is a vector (b1t , . . . , b243
                                                                                       t ), such
                       P243
that bjt ≥ 0 and j=1 bjt = 1. We set the initial belief b0 to the uniform distribution
   1             1
( 243 , . . . , 243 ).
     In order to represent dependencies between segment difficulties we define two ele-
ments: i) a cosine similarity matrix which provides a measure of geometric similarity
between pairs of segments in the path (see matrix K in Figure 1.c), ii) a pairwise MRF
to compute a joint probability function over configurations of segment difficulties (see
the graph in Figure 1.c). The MRF is used to factorize the joint probability of seg-
ment difficulties as a product of potential functions over theQ       maximal cliques of the
                                                                  1
graph. The joint probability is then written as p(f |θ) = Z(θ)          q∈Q ψq (fq |θq ), where
f is a vector of difficulties (e.g., (H, M, L, L)), θ is a parametrization of the MRF,
Q is the set of (maximal) cliques of the graph, ψq (fq |θq ) is a potential function for
clique q and Z(θ) is a normalization factor called partition function. We use a sim-
plified version of MRF called pairwise MRF in which the parametrization is restricted
to the edges of the graph rather than to the maximal cliques and we conveniently ex-
press potentials as exponentials with Boltzmann distribution. In this way the product of
potentials in p(f |θ) can be computed by adding the energies of each of the maximal
cliques. Since our difficulty variables are discrete, we represent the energy functions
as tables of (non-negative) numbers representing the relative “compatibility” between
the different assignments of segment difficulties. In particular, given a pair of segments
(si , sj ) | i, j ∈ {1, 2, 3, 4} having cosine similarity equal to 1, 0 or -1, the energy func-
tions of the corresponding pair of difficulties (fi , fj ) | fi , fj ∈ {L, M, H} are defined
as E1 (v, w), E0 (v, w) and E−1 (v, w) where v = I(fi ), w = I(fj ) and I(·) is the in-
dex function. To compute the energy function of couples of segments having generic
cosine similarity k ∈ [−1, 1] we interpolate the parameters in E1 (v, w), E0 (v, w) and
E−1 (v, w) by quadratic polynomials.
    The joint probability computed by the MRF is finally used to update the beliefs
generated step-by-step by the TDP simulator according to the formula b̄jt = α · bjt +
(1 − α) · bjt · p(f j ), where t ∈ N is the time step, j ∈ {1, . . . , 243} is the index of
the difficulty configuration (i.e., hidden state), α ∈ [0, 1] is a weighting factor, bjt is the
probability of the j-th difficulty configuration provided by standard simulator, p(f j ))
is the joint probability of the j-th difficulty configuration provided by the MRF, and b̄jt
is the probability of the j-th difficulty configuration provided by TDP simulator.


3   Results

We implemented the POMDP model in a POMDPX XML file [1] and computed a pol-
icy for this model using function pomdpsol of the SARSOP solver [7] in the APPL
Toolkit [1]. Then we simulated the model 100 times using the function pomdpsim of
the APPL toolkit, thus obtaining 100 time-evolutions of the system. In all these simu-
lations the true hidden state was manually fixed to f¯∗ = (H, H, L, L). We call these
time evolutions standard (STD) simulations since they were computed by the standard
simulator and do not consider the path geometry to improve the belief over segment dif-
ficulties. Each simulation has 300 time steps and stores information about the evolution
of observable state variables, belief over hidden state, actions and rewards.
     Subsequently, we modified the pomdpsim simulator in two ways: i) by forcing the
belief to the true hidden state (i.e., the probability distribution has all the mass into the
real state and zero probability for other states), we call this simulator oracle (ORC)
since it has complete information about the true hidden state, ii) by implementing the
TDP approach presented above, we call this simulator TDP since it considers path ge-
ometry to improve the belief update over segment (i.e., task) difficulties. We performed
100 simulations by ORC and TDP simulators, respectively, and compared results with
those achieved by STD simulator, in terms of belief evolution, average number of steps
to reach the goal, and average number of battery failures.
     Figure 2.a shows the evolution of beliefs in the three simulation cases: columns
represent different simulation strategies, namely, STD, TDP and ORC, while rows rep-
resent segments 1, 2, 3 and 4 of the square path. Each chart shows the average be-
lief (over 100 simulations) for a particular simulation strategy in a specific segment.
A consequence of the improved belief update strategy in TDP is the improvement of
performance in terms of (reduced) number of steps to reach the end of the path. This is
observable in Figure 2.b which shows the average number of steps required by each ap-
proach to reach the end of segments s1 , s2 , s3 and s4 . We observe that using TDP (blue
line) the agent is able to reach the end of the path with less steps than using STD (red
line) and with a similar failure rate (i.e., the percentage of times in which the battery
ends before the end of the path). ORC has the best performance with 96.4 steps, on av-
erage, to reach the end of the square and 24% of failures. The second best performance
is that of TDP with 102.1 steps and a failure rate of 22%. Both approaches outperform
STD which needs 111.6 steps and has a failure rate of 22%. A key extension of the
proposed approach concerns scaling on more complex scenarios. To this end we are
currently exploring the usage of Partially Observable Monte Carlo Planning (POMCP)
methods [12].


Acknowledgments

This work is partially funded by the European Union’s Horizon 2020 research and in-
novation programme under grant agreement No 689341.
Fig. 2: a) Time evolution of the belief state. Red lines identify the true hidden state (H, H, L, L).
Asteriscs in notation H ∗ ∗∗ represents any difficulty. The belief of all simulators evolve towards
the true hidden state but ORC overperformes TDP, and TDP overperforms STD. b) Number of
steps needed to reach the end of each segment. Averages are computed over 100 simulations.
ORC (green) reaches the end first, then TDP and finally STD.


References
 1. APPL Toolkit. http://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/.
 2. V. Berenz, F. Tanaka, and K. Suzuki. Autonomous battery management for mobile robots
    based on risk and gain assessment. Artificial Intelligence Review, 37(3):217–237, 2012.
 3. F. Dressler and G. Fuchs. Energy-aware operation and task allocation of autonomous robots.
    In Proc. of the 5th Int. Workshop on Robot Motion and Control, pages 163–168, 2005.
 4. A. Hamza and N. Ayanian. Forecasting battery state of charge for robot missions. In SAC,
    pages 249–255, 2017.
 5. INTCATCH H2020 EU project website. http://www.intcatch.eu/.
 6. L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially ob-
    servable stochastic domains. Artif. Intell., 101(1-2):99–134, 1998.
 7. H. Kurniawati, D. Hsu, and W. S. Lee. Sarsop: Efficient point-based pomdp planning by
    approximating optimally reachable belief spaces. In Robotics: Science and Systems, 2008.
 8. Malrey Lee, Mahmoud Tarokh, and Matthew Cross. Fuzzy logic decision making for multi-
    robot security systems. Artificial Intelligence Review, 34(2):177–194, 2010.
 9. J. LeSage and R. Longoria. Characterization of load uncertainty in unstructured terrains and
    applications to battery remaining run-time prediction. J. Field Rob., 30(3):472–487, 2013.
10. S. C. W. Ong, S. W. Png, D. Hsu, and W. S. Lee. Planning under uncertainty for robotic tasks
    with mixed observability. Int. J. of Robotics Research, 29(8):1053–1068, 2010.
11. A. Sadrpour, J. Jin, and A. G. Ulsoy. Mission energy prediction for unmanned ground vehi-
    cles. In ICRA, pages 2229–2234, 2012.
12. D. Silver and J. Veness. Monte-carlo planning in large pomdps. In Advances in Neural
    Information Processing Systems 23, pages 2164–2172, 2010.